Improve the flow of biological information

Google's ability to store petabytes and serve content globally are difficult to match, and biodiversity institutions want to focus their funds on domain-specific data curation and application functionality. Here are some ideas of how to leverage Google's tools to get critical biodiversity information onto the web, build better biodiversity information applications and help biodiversity researchers find the information they are looking for.

  • How do I tell Google about our data?
  • How can we improve the experience of searching for our information on Google?
  • Can Google help us store and serve our large data sets?
  • Digitizing taxonomic literature: what's the relationship between Google Books and the proposed Biodiversity Heritage Library?
  • Mapping biodiversity data

Although I work for Google, this page is my own, and I may not have everything right. You should check the terms of service for any Google services and make sure they are a fit for you before using them.

In all of these cases I'm interested in how well these ideas work for you and what you make....let me know!

How do I tell Google about our data?

I'm still working on the best answer for this, but good proximate answers currently include using the Google Base API to tell Google's index about your structured data, and using that structured data to build Google Co-op features.

Let me know how these efforts go for you.

 

How can we improve the experience people have searching for our information on Google?

Google Co-op is a set of tools that allow individuals or communities to develop their own search features and search experiences.

Host a custom search engine on your website

Your wesbite visitors can use a search box powered by Google that presents results from a list of sources you specify. For example, a custom search engine might contain only scientifically-accurate resources on evolution, or offer search across multiple websites together comprising a conservation project partnership.

Let Google searchers know when you have results for them

Use Google Coop to make your web data applications available to your membership directly from their Google.com searches. For example, the IUCN and the Consortium for Barcode of Life have used Co-op’s Subscribed Links feature to provide data and links to results from their own datasets at the top of search results when subscribers search on Google.

Karl Fogel has written a nice narrative explanation of the potential of Google Co-op's Subscribed Links, with an Encyclopedia of Life-type project as an example.

Developing a search feature with Co-op is another way to tell Google about your structured data. Give Google a list!

Alert Google to structured information

You can also use Google Base to simply let Google know that your structured data exists, so that searches for it will return better results. Use the API or bulk upload feature for spreadsheets. Create a custom data type for your data set, and provide either the data records you wish to share, or enough of the record to create a Base listing that will point to your institution’s website where the rest of the data can be found.

 

Can Google help us store and serve our large data sets?

Host data using Google Base API

Google Base has an API that allows you to feed structured information into our storage system and pull it back out again. Instead of building your own server clusters and back-up systems, your dollars can focus on curating the data and developing impressive user interfaces to your biodiversity information. I believe you can choose if you want Google to index your data or if your data will be only for your application to use.

As you explore, ignore the current public face of Google Base. The single-listing form and the current search UI are very listing-centric and not helpful for this community's data. The API provides you direct access to the potential of Google Base.

Here's a simple diagram describing key activities involved in getting biodiversity data online. Google Base can help with the middle section, storing and serving data through Google's global reach and serving infrastructure.

Some projects looking into using Google Base for data caches include MaNIS and Arctos. Tell me if you use Google Base so I can share your example here.

 

Digitizing taxonomic literature

Literature: Digitizing, Indexing, Hosting, and Selling using Google Books

Museums who publish or have published books,journals and any copyrighted materials can participate in books.google.com, making the scanned material accessible at a level the publisher feels comfortable with. With the upcoming Online Access program, researchers around the world will have access to pay to view a journal they need for their taxonomic work. Google provides the digitization and indexing. Collectively, museum by museum, this can help bring taxonomic literature online. 

For example, searching Google Books for the Yellow-Billed Cuckoo (Coccyzus americanus) finds a reference in the Bulletin of the Smithsonian, in the Proceedings of the Zoological Society of London. Their publishers have arranged that only small pieces of the text are viewable...enough to find it and know if you want the rest of the reference. Additionally, a book on Argentine Ornithology from 1888 is available fully online for any researcher to view it.

Taxonomic literature is some of the most valuable and often-used older literature on library shelves. If your institution is a partner with Google in scanning library books, see what you can do to get older taxonomic literature moved to the front of the priority queue.

What's the relationship between Google Books and the proposed Biodiversity Heritage Library?

If the Biodiversity Heritage Library is funded, it will be very exciting for this community. It will have an open-access data use model that supports the community building various applications upon the data, and the project itself is planning to build an exciting page viewer that will serve taxonomists.

There are various aspects of making taxonomic literature available online, including digitizing it, providing search interfaces for it, and actually viewing the literature. Here is my take on the relative roles that the Google Books and the BHL could play in the digitizing of the literature. "Open access" is the BHL data use model. "Free access" means the scanned literature is available for free. "Paid access" means that the scanned literature is available for a cost. Note that for all literature still under copyright the copyright holder must chose under which access model their material should be scanned.

I see room for Google Books to play a key role in providing a central place for people to search for and find all taxonomic literature. It would be a shame if one had to search in multiple places to get coverage for all the digitized literature in this domain.

There's also a lot more relevant biological literature than just the taxonomic literature. If the BHL and the Google Books Library Program partners coordinate their digitizing queues to avoid overlap, and publishers of biological materials use the Google Books Publisher program, biological literature as a whole will come online more quickly.

Mapping Biodiversity Data

Google has lots of cool things you can do with geographically enabled data, for free. Here are some examples of what's possible - and what's already being done.

Google Earth's KML files display on Maps

Here's an example where GBIF's KML download file is being displayed on Google Maps. The same file could work both on the web and in the Google Earth client. Pasting the URL of a KML file into the Google Maps search box will display it on the map within the browser, no client download required. Basic map data can be dynamic and viewable without a mapserver.

Google Maps API: BerkeleyMapper

BerkeleyMapper is a service that uses Google Maps API to offer museum's web search interfaces a way to display geocoded datapoints. BerkeleyMapper adds important features like point uncertainty and additional layer types.

Google Maps API: AntWeb comparative collection maps

Not only does AntWeb use Google Maps API to display collection localities throughout its taxonomic listing pages, it even uses multiple instances of the map to facilitate comparison of the localities for different species in an entire genus. And if a new record becomes part of AntWeb, the maps are automatically updated.

Google Maps API: OBIS geographic species search

OBIS' Google Maps mash-up allows you select a 0.5 degree-square on earth and retreive a species list. What a great entrance into their species information!

Google Earth: Features for Biodiversity Applications

The latest version (4) of Google Earth includes support for WMS image and data overlays. It also has a new widget for interacting with the time component in your data. See an example KML for a geo-tagged whale shark as it is tracked through the Indian Ocean. And interacting effectively with clusters of placemarks at the same or near locations is addressed.

I'm looking forward to seeing what the biodiversity informatics community can do with these new features...

Google Earth: AntWeb community layer

Sometimes really amazing KML files created by other organizations become featured layers in Google Earth. An example of this is the AntWeb community layer, based on the georeferenced specimens from AntWeb, and featuring the fantastic close up pictures of ant species. To see it, download Google Earth, and open the Google Earth Community folder in the bottom layers panel.

Google Earth: UNEP featured content and Image Overlays

Poke around in the Featured Content folder too - In October, the United Nations Environment Program was featured, with pictures of places around the earth about 30 years ago and again today. A cool feature is being able to click the link within the information bubble and overlay the pictures onto Google Earth. Now twiddle the slider in the left panel and see one image fade into the other.

 

Other possibilities to try or watch

Consider Google Docs & Spreadsheets when writing collaborative grant proposals.

It provides a web-based, version-control, access-controlled application for writing prose or managing spreadsheet data. Think about crafting grant narratives and honing budgets but without all the version issues and passing things around by e-mail.

Google Apps for Your Domain offers your domain's users the interfaces of gMail, Google Calendar and other applications.

 

Host videos on video.google.com

The Bishop museum is trying this. Videos are presented in a compressed format for good playback.

Host images using Picasa web album

I'll be happy to provide you an invitation. Pictures are presented in a browser-friendly size. I'll be happy to work with you to find institution-scale appropriate solutions to the current quotas.