The data provenance, version control and configuration management white paper is now available for discussion. When making posts please remember to follow the house rules. Please also take time to read the full pdf before commenting and where possible refer to one or more of section titles, pages and line numbers to make it easy to cross-reference your comment with the document.
The recommendations are reproduced below:
• As an outcome of the workshop, there should be a clear definition of primary (Level 0) and secondary (Level 1) source database across the spectrum of observing systems which may contribute data to the land surface temperature database.
• We should establish a coordinated international search and rescue of Level 0, primary-source climate data and metadata both documentary and electronic (see wp3.) This effort would recognize and support similar on-going national projects. Once located, the project should (a) provide, if necessary, a secure storage facility for these documents or hard-copies of same, (b) create, where appropriate, digital images of the documents for the archive for traceability and authenticity requirements, (c) key documentary information into digital files (native format in Level 1 and uniform format in Level 2), (d) archive, test and quality-assure raw data files, technical manuals and conversion algorithms which are necessary to understand how the geophysical variable may be unpacked and generated from electronic instrumentation, and (e) securely archive the files for public access and use.
• A certification panel will be selected to rate the authenticity of source material as to its relation to the “primary-source”, i.e. to certify a level of confidence that the Level 1 data, as archived, represents the original values from the Level 0 primary source. The process will often be dynamic, since we anticipate that new information will always become available to confirm or cast doubt on the current authenticity rating.
• Given the extent of this project and the unpredictable nature of the evolution of the archive, the reliance on an active panel to address version-control issues as they arise will be necessary. The panel will investigate the possibility of utilizing commercial off-the-shelf or open-source version control software for electronic files and software code (e.g. Subversion (http://subversion.apache.org/).
• Since one requirement of this project is to preserve older versions of the archive, and that a considerable amount of tedious research will be performed on any one version, it is generally assumed that up-versioning will be performed of the basic, Level 2 digital archive as sparingly as possible.
• The algorithms that produce the datasets used for testing and the datasets themselves must be documented and version-controlled.
• A configuration management board will be selected to initially define the necessary infrastructure, formats and other aspects of archive practices. A permanent board will then be selected to oversee the operation. This board and the version-control panel may be coincident or at least overlapping in membership.