Developing a digital U.S. biological collections national resource: First steps towards a strategic plan

Summary

A strategic plan for a 10-year national effort to digitize and mobilize images and data associated with biological research collections is being developed. The key objective of the plan is to create a publicly available, comprehensive national collections resource that can be used to address a wide range of research questions and serve stakeholders in government agencies, academic institutions, and international biodiversity organizations. A workshop, held at the National Evolutionary Synthesis Center on February 5-7, 2010, drafted the present outline for the digitization and web mobilization of data and images associated with U.S. biological collections. Input from the community is requested as this plan develops to ensure that it builds appropriately on existing projects and reflects the missions and needs of the nation’s diverse biological collections.

Significance of Collections Digitization

Biological collections, gathered over more than two centuries of research and exploration, represent a significant national resource for research and applied biology that has been underutilized in the digital realm. Knowledge of the history of life is accessible only through biological collections of specimens, fossils, tissues, images and other data that are held in perpetuity by museums, universities and various state, federal, and non-governmental agencies. Knowledge of biodiversity, obtained through the use of collections, is critically important for studies of invasive species, biological conservation programs, land management strategies, biotic responses to climate change, the spread of pathogenic organisms, and research and management activities of many kinds. A coordinated effort to digitize existing biological collections and to mobilize the data and images in a freely available online resource is needed. Recent technological advances in the digitization of collections, combined with decades of experience and emerging efforts to standardize and integrate across collections, put the collections community in a position to address the problem in a concerted way. This effort would have major, positive impacts on U.S. scientific achievement and global scientific collaboration.

The Scope of Collections Digitization

Collections digitization is defined broadly to include transcription into electronic format of various types of data associated with specimens, the capture of digital images of specimens, and the georeferencing of specimen collection localities. In order to assess the scope of the undertaking required to digitize the nation’s collections, the collections community has conducted a survey to provide an overview of the number and diversity of specimens contained in U.S. collections. Additionally, the community has held three workshops on “Future Directions in Biodiversity and Systematics Research”. These, in addition to two recent reports (1,2), highlight the scale of the challenge, the need to address the integration of digitized biological data, the need to coordinate the capture of specimen data and images, and the necessity of providing broad accessibility to specimen data by scientists worldwide. Estimates of collection size range as high as three billion specimens globally, with as many as one billion or more specimens preserved and cared for by U.S. institutions, most of which (as high as 90%) are not accessible online.

At the current time there does not exist a comprehensive strategic plan for the digitization of the nation’s biological research collections. To be effective, such a plan should be conceived as a grand challenge and undertaken as a unified mission involving a coordinated funding program and well designed strategy for execution. In addition to addressing needs for physical care and housing of collections and support of collections-based research broadly (3,4), it is vital that the U.S. increase the online accessibility of its biological collections through an integrative and focused digitization effort in order to be able to best utilize the full value of our national biological collections resources. The plan also calls for the development of cyberinfrastructure to promote efficient and standard capture and mobilization of these data to make the national biological collections resource publicly available for analysis. The present focus of this strategy is on the digitization and mobilization of existing collection data. This initiative would not directly support the development of new collections or collection improvement through enhanced infrastructure, curation or management.

Objectives, Vision and an Outline for Organizing of Effort

The key objective of the plan is to create a publicly available, sustainable and comprehensive national collections resource by digitizing and mobilizing data from the nation’s biological research collections. Some of the desirable features of this new digital collections resource are:

• Images and data from all U.S. biological collections, large and small, integrated in a web accessible interface using shared standards and formats.

• New web interfaces, visualization and analysis tools, data mining, image analysis, and georeferencing processes developed and made available for using and improving the collections resource.

• The existing massive backlog of non-digitized collections digitized and web mobilized, while tools, training, and infrastructure created for preventing the reoccurrence of such a backlog.

A suggested framework for the digitization effort is presented here, for the purposes of obtaining community feedback on models for developing a biological collections digitization initiative.

Three tiers of effort that will accomplish this objective have been identified:

1. Develop a coordinated effort to provide technological support for the nationwide collections digitization effort, to organize new efforts with existing collections-based projects and international efforts, and to disseminate standards, techniques and best practices. This effort might take the form of a new center based at a single institution, a collaborative administrative group across institutions, or some other model that will achieve the same function.

2. Develop a network of regional collaborations for collection digitization across the U.S. These regional efforts might consist of institutions housing both large and small collections from the same region that unite to focus on digitization and web mobilization of collections in order to contribute to the national collections resource.

3. Develop investigator-driven and cross-regional collaborations driven by the specific needs of collections of a particular clade or preservation type, or motivated by a particular scientific question to be addressed by the use of collections images and data.

Strategy for Community Involvement

The plan to create a national digitized biological collections resource requires a strategic plan with broad support and input from the collections community and a diversity of stakeholders. Such a strategic plan incorporating community suggestions will be the product of this effort. The mechanism for community participation in this planning includes wide distribution of the present outline to institutions, agencies, and professional societies. The responses to the plan, collected through email and blog commentary will be used in future meetings to complete a strategic plan. Community feedback on the initiative outlined here is critical. Feedback can be made by adding a comment on the blog page (http://digbiocol.wordpress.com/), sending an email to wg-digitization@nescent.org, or contacting individual participants in the recent meeting (www.nescent.org/wg_digitization/Main_Page).  Group feedback based on institutional priorities or taxon-based needs is welcomed.  Specific feedback is needed in areas such as support for the proposed model, suggestions for revision, ideas regarding the three-tiered approach suggested here, priorities for collection digitization, and ways to maximize collaboration across institutions and federal agencies, and at the international level. This feedback will be aggregated and provided to participants in future planning sessions that will develop a final strategic plan.

References

1. Report from the National Science Foundation based on a survey of collections which had received federal support for projects over the past twenty years http://www.nsf.gov/pubs/2009/nsf09044/nsf09044.pdf

2. Report from OSTP and the Interagency Working Group on Scientific Collections based on the survey of federally-held collections: http://www.nescent.org/wg/digitization/images/d/d1/Collections2.pdf

3. Stevenson, J. W. and D. W. Stevenson. 2003. Development of a national systematics infrastructure: a virtual instrument for the 21st century. Report to the National Science Foundation, Biodiversity Surveys and Inventories Program. New York, December, 2003.

4. Page, L., Funk, V., Jeffords, M., Lipscomb, D., Mares, M., and A. Prather. 2004. Workshop to produce a decadal vision for taxonomy and natural history collections, Gainesville, November 2003. Report to the National Science Foundation, Biodiversity Survey and Inventories Program, Gainesville, November, 2003.