Dark Data and ANR

Understanding the ecological impacts of a changing climate in California will require research using a synthesis of historical and contemporary data.  Detailed records of past agricultural, ecological, and climate conditions exist across the state of California in several forms, including paper archives, historic imagery, biological specimens, and digital data. Unfortunately, these materials are often hidden, many are disorganized or degraded, and some exist as “dark archives” that are currently invisible to researchers. The ANR Research and Extension Centers (RECs) provide an example of both the challenges we face and potential benefits we might reap in bringing these data to light. The diversity of REC sites and wealth of historical research and collections makes our efforts in dark data recovery significant: we need to gather and organize historical materials in ways that make them visible to researchers and useable as research data into the future.

The UC ANR IGIS statewide program has recently created a database, the IGIS InfoBase, for the purpose of archiving historical and current research documents and data from the Research and Extension Center (REC) sites. Currently, contributions from the RECs to ANR InfoBase includes PDFs of publications, and site descriptions; excel documents of tabular research data; GIS shapefiles; imagery (including both aerial imagery and ground level photographs); and metadata, schemas, and/or abstracts that describe the datasets. InfoBase’s web portal allows users to explore, view and link the content of the database both spatially, using a map tool, and categorically, using key word searches.  Some biological collections from the RECs are being added to the database. For example, a recent grant lead by the Berkeley Initiative on Global Change Biology has digitized biological collections from Hopland REC. Other key historical datasets are also being made available. The Wieslander Vegetation Type Mapping Project has digitized hundreds of photographs, maps, and other records from statewide ecological surveys conducted during the 1920s and 30s, and these data have been made available for researchers through an API. Collectively, these data are vast and full of research potential. A recent effort documented changes in forest structure over 80 years in California through a synthesis of historical and contemporary forest structure data (McIntyre et al. 2015). However, in many cases they are not available in a format that is required for analysis and research, and in other cases they are difficult to find, as our best record of them might be in a scanned research report.

The ANR Dark Data Initiative would focus around the following questions:

1.  What historical materials are available in the REC system for research on the impact of a changing climate on ecological, biological and agricultural systems?

2.  What key cases of historic research need to be captured to ensure scientific progress?

3.  What are the best protocols to find, digitize, display, and promote the use of these materials?

The overall goal of this effort would be to 1) highlight three key cases where data can be located, digitized, added to InfoBase, and made available widely to researchers; and 2) develop a protocol to move forward given that additional funding were secured from external sources.