Tuesday, November 30, 2010
First teleconference of steering committee
Notes are posted here. This provides an update of initial steps undertaken thus far.
Wednesday, November 3, 2010
Databank working group - first teleconference
The databank working group recently held their first teleconference. Notes arising from this are available here.
Comments welcome, but there will be patchy moderation, if any, until 15th, please be patient.
Comments welcome, but there will be patchy moderation, if any, until 15th, please be patient.
Tuesday, November 2, 2010
Some additional perspectives
A couple of additional perspectives have recently been published that more formally outline outcomes of the meeting. These still fall short of full expositions of outcomes in formal publications (at least two of which and possibly more are in the works).
Significance magazine online - a new online venture by the American Statistical Association and the Royal Statistical Society.
climatecentral
In addition activities have been starting to spin up in response to the meeting. it is hoped to post information on these starting fairly shortly.
Comments are welcome but should follow the comments policy.
Significance magazine online - a new online venture by the American Statistical Association and the Royal Statistical Society.
climatecentral
In addition activities have been starting to spin up in response to the meeting. it is hoped to post information on these starting fairly shortly.
Comments are welcome but should follow the comments policy.
Monday, September 20, 2010
A few perspectives
Several active bloggers were at the meeting and posts relating to their views are linked here. This list will be updated if and when I am alerted to any additional posts elsewhere but will be limited to views from workshop participants. Please comment on their comment threads as well as here.
Serendipity - Steve Easterbrook (software expert)
http://www.easterbrook.ca/steve/?p=1913
Bryan Lawrence (British atmospheric Data Centre)
http://home.badc.rl.ac.uk/lawrence/blog/2010/09/10/surface_temperature_workshop
Claerclimatecode - Nick Barnes (software expert)
http://clearclimatecode.org/surface-temperatures-workshop/
Protons for breakfast - Michael de Podesta (metrologist)
http://protonsforbreakfast.wordpress.com/2010/09/11/surface-temperature-workshop/
http://protonsforbreakfast.wordpress.com/2010/09/12/surface-temperature-data/
Serendipity - Steve Easterbrook (software expert)
http://www.easterbrook.ca/steve/?p=1913
Bryan Lawrence (British atmospheric Data Centre)
http://home.badc.rl.ac.uk/lawrence/blog/2010/09/10/surface_temperature_workshop
Claerclimatecode - Nick Barnes (software expert)
http://clearclimatecode.org/surface-temperatures-workshop/
Protons for breakfast - Michael de Podesta (metrologist)
http://protonsforbreakfast.wordpress.com/2010/09/11/surface-temperature-workshop/
http://protonsforbreakfast.wordpress.com/2010/09/12/surface-temperature-data/
Monday, September 13, 2010
Exeter workshop presentations available
I am in the process of uploading all presentations to the surfacetemperatures.org site at http://www.surfacetemperatures.org/exeterworkshop2010. I will also post agreed outcomes, hopefully within the week and revamp the frontpage to reflect these outcomes. Like everybody else I also have a day job so please be patient whilst I make these changes.
In the longer term we hope to publish highlights in BAMS and as a longer WMO Tech Doc but this will take some time and require consultation with organisers and rapporteurs and vetting by participants.
In the longer term we hope to publish highlights in BAMS and as a longer WMO Tech Doc but this will take some time and require consultation with organisers and rapporteurs and vetting by participants.
Wednesday, September 1, 2010
Thanks for input - how this will be considered
Thanks on behalf of the entire organising committee to all those who have taken the time and effort to comment constructively upon the white papers. I am currently in the process of parsing out all comments to the most relevant white papers. In some cases this will involve your post being considered by multiple breakout groups. Within each breakout group we will start the discussion by reading the pertinent comments received before debating amongst those present. Several copies of your remarks will also be available in hard copy for breakout groups to refer to. Breakout chairs have been tasked with ensuring that your comments are adequately considered. There will then be plenary discussions before final decisions are taken. Workshop participants have been urged to read the blog postings that are relevant to their assigned breakout groups in advance of the meeting.
I will be closing comments on all remaining posts at 12Z on 9/2. I will retain comments open on this thread until further notice (comments still to follow the house rules to be considered) but will have limited connectivity and therefore ability to moderate comments until the week after the workshop so any comments received after 9/2 may not be processed until then.
After the meeting next steps will be much clearer and will start to be articulated in a journal article and a technical document although clearly no concrete dates can be given for these appearing. We also hope to emerge from the meeting with some concrete next steps which will be articulated either here or more likely on the surfacetemperatures.org site as soon as details are confirmed.
I will be closing comments on all remaining posts at 12Z on 9/2. I will retain comments open on this thread until further notice (comments still to follow the house rules to be considered) but will have limited connectivity and therefore ability to moderate comments until the week after the workshop so any comments received after 9/2 may not be processed until then.
After the meeting next steps will be much clearer and will start to be articulated in a journal article and a technical document although clearly no concrete dates can be given for these appearing. We also hope to emerge from the meeting with some concrete next steps which will be articulated either here or more likely on the surfacetemperatures.org site as soon as details are confirmed.
Tuesday, August 10, 2010
White paper 14 - solicitation of input
The solicitation of input from the community at large including non-climate fields and discussion of web presence white paper is now available for discussion. When making posts please remember to follow the house rules. Please also take time to read the full pdf before commenting and where possible refer to one or more of section titles, pages and line numbers to make it easy to cross-reference your comment with the document.
There are no specific recommendations associated with this white paper. It is more a discussion document of options.
8/25 Please associate comments with the appropriate white paper
There are no specific recommendations associated with this white paper. It is more a discussion document of options.
8/25 Please associate comments with the appropriate white paper
Friday, July 30, 2010
White paper 16 - Interactions with other activities
The interactions with other activities white paper is now available for discussion. When making posts please remember to follow the house rules. Please also take time to read the full pdf before commenting and where possible refer to one or more of section titles, pages and line numbers to make it easy to cross-reference your comment with the document.
The recommendations are reproduced below:
1) The development of a land surface temperature databank should allow for the extension to multivariate datasets and compatibility with global databanks from the outset.
2) Digitization activities should take a multivariate approach and always incorporate all metadata.
3) Emerging links between land and ocean dataset developers and researchers should be fostered and facilitated by the development of compatible databanks, data products and joint research projects.
4) Appropriate linkage to the activities supported by space agencies aimed at the generation of long-term climate data records addressing the GCOS Essential Climate Variables should be established when developing the surface temperature databank. The experience of GHRSST in the combined management and use of satellite and in situ data should be exploited.
5) Research is needed to improve multivariate analysis methods and to develop techniques to produce consistent global data products.
6) The important role of reanalysis in providing global multivariate analyses with wide application including quality assessment should be recognized. These products will form an essential part of a successful surface temperatures project providing both a set of estimates and a wealth of metadata regarding the data quality.
7) Funding agencies should recognize that an internationally coordinated and sustained approach to the development, maintenance and improvement of climate databanks and derived data products will have wide benefits. This most logically includes a CMIP type portal for climate data records from all observing platforms with common formats and strong naming conventions to enable ease of intercomparison.
The recommendations are reproduced below:
1) The development of a land surface temperature databank should allow for the extension to multivariate datasets and compatibility with global databanks from the outset.
2) Digitization activities should take a multivariate approach and always incorporate all metadata.
3) Emerging links between land and ocean dataset developers and researchers should be fostered and facilitated by the development of compatible databanks, data products and joint research projects.
4) Appropriate linkage to the activities supported by space agencies aimed at the generation of long-term climate data records addressing the GCOS Essential Climate Variables should be established when developing the surface temperature databank. The experience of GHRSST in the combined management and use of satellite and in situ data should be exploited.
5) Research is needed to improve multivariate analysis methods and to develop techniques to produce consistent global data products.
6) The important role of reanalysis in providing global multivariate analyses with wide application including quality assessment should be recognized. These products will form an essential part of a successful surface temperatures project providing both a set of estimates and a wealth of metadata regarding the data quality.
7) Funding agencies should recognize that an internationally coordinated and sustained approach to the development, maintenance and improvement of climate databanks and derived data products will have wide benefits. This most logically includes a CMIP type portal for climate data records from all observing platforms with common formats and strong naming conventions to enable ease of intercomparison.
White paper 10 - Dataset algorithm performance assessment based upon all efforts
The dataset algorithm performance assessment based upon all efforts white paper is now available for discussion. When making posts please remember to follow the house rules. Please also take time to read the full pdf before commenting and where possible refer to one or more of section titles, pages and line numbers to make it easy to cross-reference your comment with the document.
The recommendations are reproduced below:
• Assessment criteria should be developed entirely independently of the dataset developers and should be pre-determined and documented in advance of any tests.
• It is crucial that the purpose to which a dataset could be put be identified and that a corresponding set of assessment criteria are derived that are suitable for that purpose.
• The output of an assessment should be to determine whether a dataset is fit for a particular purpose and to enable users to determine which are most suitable datasets for their needs. Outputs should be clearly documented in such a form as to enable a clear decision tree for users.
• Validation of an algorithm should always be carried out on a different dataset from that used to develop and tune the algorithm.
• A key issue is to determine how well uncertainty estimates in datasets represent a measure of the difference between the derived value and the “true” real world value.
• It would be worthwhile to consider the future needs for the development of climate services by indentifying an appropriate set of regions or stations that any assessment should include.
• New efforts resulting from this initiative should be coordinated with on-going regional and national activities to rescue and homogenize data.
The recommendations are reproduced below:
• Assessment criteria should be developed entirely independently of the dataset developers and should be pre-determined and documented in advance of any tests.
• It is crucial that the purpose to which a dataset could be put be identified and that a corresponding set of assessment criteria are derived that are suitable for that purpose.
• The output of an assessment should be to determine whether a dataset is fit for a particular purpose and to enable users to determine which are most suitable datasets for their needs. Outputs should be clearly documented in such a form as to enable a clear decision tree for users.
• Validation of an algorithm should always be carried out on a different dataset from that used to develop and tune the algorithm.
• A key issue is to determine how well uncertainty estimates in datasets represent a measure of the difference between the derived value and the “true” real world value.
• It would be worthwhile to consider the future needs for the development of climate services by indentifying an appropriate set of regions or stations that any assessment should include.
• New efforts resulting from this initiative should be coordinated with on-going regional and national activities to rescue and homogenize data.
Thursday, July 29, 2010
White paper 11 - Spatial and temporal interpolation of environmental data
The spatial and temporal interpolation of environmental data white paper is now available for discussion. When making posts please remember to follow the house rules. Please also take time to read the full pdf before commenting and where possible refer to one or more of section titles, pages and line numbers to make it easy to cross-reference your comment with the document.
Update 8/2: A supplement has also been published for comment / consideration. Please be sure top delineate in your comments whether you are discussing the main white paper or the supplement.
The recommendations from the main white paper are reproduced below:
• The choice of interpolation technique for a particular application should be guided by a full characterization of the input observations and the field to be analyzed. No single technique can be universally applied. It is likely that different techniques will work best for different variables, and it is likely that these techniques will differ on different time scales.
• Data transformations should be used where appropriate to enhance interpolation skill. In many cases, the simple transformation of the input data by calculating anomalies from a common base period will produce improved analyses. In many climate studies, it has been found that separate interpolations of anomaly and absolute fields (for both temperature and precipitation) work best.
• With all interpolation techniques, it is imperative to derive uncertainties in the analyzed gridded fields, and it is important to realize that these should additionally take into account components from observation errors, homogeneity adjustments, biases, and variations in spatial sampling.
• Where fields on different scales are required, interpolation techniques should incorporate a hierarchy of analysis fields, where the daily interpolated fields should average or sum to monthly interpolated fields.
• Research to develop and implement improved interpolation techniques, including full spatio-temporal treatments is required to improve analyses. Developers of interpolated datasets should collaborate with statisticians to ensure that the best methods are used.
• The methods and data used to produce interpolated fields should be fully documented and guidance on the suitability of the dataset for particular applications provided.
• Interpolated fields and their associated uncertainties should be validated.
• The development, comparison and assessment of multiple estimates of environmental fields, using different input data and construction techniques, are essential to understanding and improving analyses.
Update 8/2: A supplement has also been published for comment / consideration. Please be sure top delineate in your comments whether you are discussing the main white paper or the supplement.
The recommendations from the main white paper are reproduced below:
• The choice of interpolation technique for a particular application should be guided by a full characterization of the input observations and the field to be analyzed. No single technique can be universally applied. It is likely that different techniques will work best for different variables, and it is likely that these techniques will differ on different time scales.
• Data transformations should be used where appropriate to enhance interpolation skill. In many cases, the simple transformation of the input data by calculating anomalies from a common base period will produce improved analyses. In many climate studies, it has been found that separate interpolations of anomaly and absolute fields (for both temperature and precipitation) work best.
• With all interpolation techniques, it is imperative to derive uncertainties in the analyzed gridded fields, and it is important to realize that these should additionally take into account components from observation errors, homogeneity adjustments, biases, and variations in spatial sampling.
• Where fields on different scales are required, interpolation techniques should incorporate a hierarchy of analysis fields, where the daily interpolated fields should average or sum to monthly interpolated fields.
• Research to develop and implement improved interpolation techniques, including full spatio-temporal treatments is required to improve analyses. Developers of interpolated datasets should collaborate with statisticians to ensure that the best methods are used.
• The methods and data used to produce interpolated fields should be fully documented and guidance on the suitability of the dataset for particular applications provided.
• Interpolated fields and their associated uncertainties should be validated.
• The development, comparison and assessment of multiple estimates of environmental fields, using different input data and construction techniques, are essential to understanding and improving analyses.
Wednesday, July 28, 2010
White paper 15 - Governance
The governance white paper is now available for discussion. When making posts please remember to follow the house rules. Please also take time to read the full pdf before commenting and where possible refer to one or more of section titles, pages and line numbers to make it easy to cross-reference your comment with the document.
The recommendations are reproduced below:
1. Efforts should be made to capitalise on the efforts of existing and ongoing projects
2. A Steering Committee should be established, to be composed of the chairs of the various working level groups established to conduct the various elements of the work, to include experts from outside the climate science community, and a number of ‘project evangelists’.
3. The steering committee should have designated authority to make reasonable choices to allow for flexibility and efficacy.
4. The steering committee should annually report to stakeholder bodies on the basis of a single annual report.
5. A small number (3-5) of ‘project evangelists’ should be identified and mandated to proactively engage with external stakeholders and facilitate effective two-way communications
6. Findings and conclusions should be reported in the scientific literature and at relevant conferences
The recommendations are reproduced below:
1. Efforts should be made to capitalise on the efforts of existing and ongoing projects
2. A Steering Committee should be established, to be composed of the chairs of the various working level groups established to conduct the various elements of the work, to include experts from outside the climate science community, and a number of ‘project evangelists’.
3. The steering committee should have designated authority to make reasonable choices to allow for flexibility and efficacy.
4. The steering committee should annually report to stakeholder bodies on the basis of a single annual report.
5. A small number (3-5) of ‘project evangelists’ should be identified and mandated to proactively engage with external stakeholders and facilitate effective two-way communications
6. Findings and conclusions should be reported in the scientific literature and at relevant conferences
Tuesday, July 27, 2010
White paper 6 - Data provenance, version control, configuration management
The data provenance, version control and configuration management white paper is now available for discussion. When making posts please remember to follow the house rules. Please also take time to read the full pdf before commenting and where possible refer to one or more of section titles, pages and line numbers to make it easy to cross-reference your comment with the document.
The recommendations are reproduced below:
• As an outcome of the workshop, there should be a clear definition of primary (Level 0) and secondary (Level 1) source database across the spectrum of observing systems which may contribute data to the land surface temperature database.
• We should establish a coordinated international search and rescue of Level 0, primary-source climate data and metadata both documentary and electronic (see wp3.) This effort would recognize and support similar on-going national projects. Once located, the project should (a) provide, if necessary, a secure storage facility for these documents or hard-copies of same, (b) create, where appropriate, digital images of the documents for the archive for traceability and authenticity requirements, (c) key documentary information into digital files (native format in Level 1 and uniform format in Level 2), (d) archive, test and quality-assure raw data files, technical manuals and conversion algorithms which are necessary to understand how the geophysical variable may be unpacked and generated from electronic instrumentation, and (e) securely archive the files for public access and use.
• A certification panel will be selected to rate the authenticity of source material as to its relation to the “primary-source”, i.e. to certify a level of confidence that the Level 1 data, as archived, represents the original values from the Level 0 primary source. The process will often be dynamic, since we anticipate that new information will always become available to confirm or cast doubt on the current authenticity rating.
• Given the extent of this project and the unpredictable nature of the evolution of the archive, the reliance on an active panel to address version-control issues as they arise will be necessary. The panel will investigate the possibility of utilizing commercial off-the-shelf or open-source version control software for electronic files and software code (e.g. Subversion (http://subversion.apache.org/).
• Since one requirement of this project is to preserve older versions of the archive, and that a considerable amount of tedious research will be performed on any one version, it is generally assumed that up-versioning will be performed of the basic, Level 2 digital archive as sparingly as possible.
• The algorithms that produce the datasets used for testing and the datasets themselves must be documented and version-controlled.
• A configuration management board will be selected to initially define the necessary infrastructure, formats and other aspects of archive practices. A permanent board will then be selected to oversee the operation. This board and the version-control panel may be coincident or at least overlapping in membership.
The recommendations are reproduced below:
• As an outcome of the workshop, there should be a clear definition of primary (Level 0) and secondary (Level 1) source database across the spectrum of observing systems which may contribute data to the land surface temperature database.
• We should establish a coordinated international search and rescue of Level 0, primary-source climate data and metadata both documentary and electronic (see wp3.) This effort would recognize and support similar on-going national projects. Once located, the project should (a) provide, if necessary, a secure storage facility for these documents or hard-copies of same, (b) create, where appropriate, digital images of the documents for the archive for traceability and authenticity requirements, (c) key documentary information into digital files (native format in Level 1 and uniform format in Level 2), (d) archive, test and quality-assure raw data files, technical manuals and conversion algorithms which are necessary to understand how the geophysical variable may be unpacked and generated from electronic instrumentation, and (e) securely archive the files for public access and use.
• A certification panel will be selected to rate the authenticity of source material as to its relation to the “primary-source”, i.e. to certify a level of confidence that the Level 1 data, as archived, represents the original values from the Level 0 primary source. The process will often be dynamic, since we anticipate that new information will always become available to confirm or cast doubt on the current authenticity rating.
• Given the extent of this project and the unpredictable nature of the evolution of the archive, the reliance on an active panel to address version-control issues as they arise will be necessary. The panel will investigate the possibility of utilizing commercial off-the-shelf or open-source version control software for electronic files and software code (e.g. Subversion (http://subversion.apache.org/).
• Since one requirement of this project is to preserve older versions of the archive, and that a considerable amount of tedious research will be performed on any one version, it is generally assumed that up-versioning will be performed of the basic, Level 2 digital archive as sparingly as possible.
• The algorithms that produce the datasets used for testing and the datasets themselves must be documented and version-controlled.
• A configuration management board will be selected to initially define the necessary infrastructure, formats and other aspects of archive practices. A permanent board will then be selected to oversee the operation. This board and the version-control panel may be coincident or at least overlapping in membership.
White paper 4 - Near real-time updates
The near real-time updates white paper is now available for discussion. When making posts please remember to follow the house rules. Please also take time to read the full pdf before commenting and where possible refer to one or more of section titles, pages and line numbers to make it easy to cross-reference your comment with the document.
The recommendations are reproduced below:
Recognizing that (i) there is no analog to CLIMAT bulletins on the daily timescale, (ii) daily data are not routinely shared, (iii) no central repository exists because of the lack of a formal process for sharing data with daily resolution, and (iv) the “climatological data” code group in synoptic bulletins does not support the climate communities need for daily summary observations, establish a formal mechanism for dissemination of daily climate messages or requirement for transmission of daily climate observations with synoptic reports.
Recognizing more pronounced issues of the same sort at the basic instantaneous data level, consider as a secondary priority the feasibility (including data exchange policy issues) of adoption of international mechanisms to standardize the exchange of the highest resolution data.
Recognizing that a limited number of bilateral arrangements (e.g. US-Australia, US-Canada) have proven effective at improving access and near real-time data sharing of daily and sub-daily data, establish efforts at the WMO regional level to expand bilateral arrangements for sharing of daily and sub-daily data to increase data holdings and foster regular updates of global and regional daily data sets.
Recognizing that training programs such as those of the JMA have proven effective at improving NMHS capabilities to provide CLIMAT data, expand training opportunities at the regional and national level to improve the routine and regular dissemination of CLIMAT bulletins from developing nations.
Recognizing the success that GCOS Monitoring Centres and CBS Lead Centres for GCOS have had on improving the quality and quantity of CLIMAT reports, continue to support Monitoring of quality and completeness of CLIMAT transmissions and feedback to data providers. Work to garner commitments for enhanced monitoring and feedback related to synoptic bulletins and daily climate summaries.
Recognizing the efficiencies and flexibility of Table-Driven Code Forms for transferring large amounts of data, their design for ease and efficiency of processing, as well as their cost-effectiveness, encourage and support NMCs conversion of data transmissions to TDCFs, while at the same time ensuring adequate attention to issues of long-term data homogeneity in support of climate research.
Recognizing that the GTS is not structured to meet newly evolving requirements for the exchange of data in near real-time, and recognizing the 2003 agreement to move to a WMO Information System (WIS) to meet all of the WMO’s information needs, support adoption of WIS technologies and encourage establishment of GISCs and DCPCs.
The recommendations are reproduced below:
Recognizing that (i) there is no analog to CLIMAT bulletins on the daily timescale, (ii) daily data are not routinely shared, (iii) no central repository exists because of the lack of a formal process for sharing data with daily resolution, and (iv) the “climatological data” code group in synoptic bulletins does not support the climate communities need for daily summary observations, establish a formal mechanism for dissemination of daily climate messages or requirement for transmission of daily climate observations with synoptic reports.
Recognizing more pronounced issues of the same sort at the basic instantaneous data level, consider as a secondary priority the feasibility (including data exchange policy issues) of adoption of international mechanisms to standardize the exchange of the highest resolution data.
Recognizing that a limited number of bilateral arrangements (e.g. US-Australia, US-Canada) have proven effective at improving access and near real-time data sharing of daily and sub-daily data, establish efforts at the WMO regional level to expand bilateral arrangements for sharing of daily and sub-daily data to increase data holdings and foster regular updates of global and regional daily data sets.
Recognizing that training programs such as those of the JMA have proven effective at improving NMHS capabilities to provide CLIMAT data, expand training opportunities at the regional and national level to improve the routine and regular dissemination of CLIMAT bulletins from developing nations.
Recognizing the success that GCOS Monitoring Centres and CBS Lead Centres for GCOS have had on improving the quality and quantity of CLIMAT reports, continue to support Monitoring of quality and completeness of CLIMAT transmissions and feedback to data providers. Work to garner commitments for enhanced monitoring and feedback related to synoptic bulletins and daily climate summaries.
Recognizing the efficiencies and flexibility of Table-Driven Code Forms for transferring large amounts of data, their design for ease and efficiency of processing, as well as their cost-effectiveness, encourage and support NMCs conversion of data transmissions to TDCFs, while at the same time ensuring adequate attention to issues of long-term data homogeneity in support of climate research.
Recognizing that the GTS is not structured to meet newly evolving requirements for the exchange of data in near real-time, and recognizing the 2003 agreement to move to a WMO Information System (WIS) to meet all of the WMO’s information needs, support adoption of WIS technologies and encourage establishment of GISCs and DCPCs.
White paper 3 - Retrieval of historical data
The Retrieval of historical data white paper is now available for discussion. When making posts please remember to follow the house rules. Please also take time to read the full pdf before commenting and where possible refer to one or more of section titles, pages and line numbers to make it easy to cross-reference your comment with the document.
The recommendations are reproduced below:
1. A formal governance is required for the databank construction and management effort that will also extend to cover other white paper areas on the databank. This requires a mix of management and people with direct experience wrestling with the thorny issues of data recovery and reconciliation along with expertise in database management and configuration management.
2. We should look to create a version 1 of the databank from current holdings at NCDC augmented by other easily accessible digital data to enable some research in other aspects of the surface temperature challenge to start early. We should then seek other easier targets for augmentation to build momentum before tackling more tricky cases.
3. Significant efforts are required to source and digitise additional data. This may be most easily achieved through a workshop or series of workshops. More important is to bring the ongoing and planned regional activities under the same international umbrella, in order to guarantee that the planned databank can benefit from these activities. The issue is two-fold: first the releasing of withheld data, and secondly the digitising of data in hard copy that is otherwise freely available.
4. The databank should be a truly international and ongoing effort not owned by any single institution or entity. It should be mirrored in at least two geographically distinct locations for robustness.
5. The databank should consist of four fundamental levels of data: level 0 (digital image of hard copy); level 1 (keyed data in original format); level 2 (keyed data in common format) and level 3 (integrated databank/DataSpace) with traceability between steps. For some data not all levels will be applicable (digital instruments) or possible (digital records for which the hard copy has been lost/destroyed), in which case the databank needs to provide suitable ancillary provenance information to users.
6. Reconciling data from multiple sources is non-trivial requiring substantial expertise. Substantial resource needs to be made available to support this if the databank is to be effective.
7. There is more data to be digitised than there is dedicated resource to digitise. Crowd-sourcing of digitisation should be pursued as a means to maximise data recovery efficiency. This would very likely be most efficiently achieved through a technological rather than academic or institutional host. It should be double keyed and an acceptable sample check procedure undertaken.
8. A parallel effort as an integral part of establishing the databank is required to create an adjunct metadata databank that as comprehensively as feasible describes known changes in instrumentation, observing practices and siting at each site over time. This may include photographic evidence, digital images and archive materials but the essential elements should be in machine-readable form.
9. Development may be needed of formalized by new WMO arrangements, similar to those used in the marine community, to facilitate more efficient exchanges of historical and contemporary land station data and metadata (including possibilities for further standardization).
10. In all aspects these efforts must build upon existing programs and activities to maximise efficiency and capture of current knowledge base. This effort should be an enabling and coordination mechanism and not a replacement for valuable work already underway.
The recommendations are reproduced below:
1. A formal governance is required for the databank construction and management effort that will also extend to cover other white paper areas on the databank. This requires a mix of management and people with direct experience wrestling with the thorny issues of data recovery and reconciliation along with expertise in database management and configuration management.
2. We should look to create a version 1 of the databank from current holdings at NCDC augmented by other easily accessible digital data to enable some research in other aspects of the surface temperature challenge to start early. We should then seek other easier targets for augmentation to build momentum before tackling more tricky cases.
3. Significant efforts are required to source and digitise additional data. This may be most easily achieved through a workshop or series of workshops. More important is to bring the ongoing and planned regional activities under the same international umbrella, in order to guarantee that the planned databank can benefit from these activities. The issue is two-fold: first the releasing of withheld data, and secondly the digitising of data in hard copy that is otherwise freely available.
4. The databank should be a truly international and ongoing effort not owned by any single institution or entity. It should be mirrored in at least two geographically distinct locations for robustness.
5. The databank should consist of four fundamental levels of data: level 0 (digital image of hard copy); level 1 (keyed data in original format); level 2 (keyed data in common format) and level 3 (integrated databank/DataSpace) with traceability between steps. For some data not all levels will be applicable (digital instruments) or possible (digital records for which the hard copy has been lost/destroyed), in which case the databank needs to provide suitable ancillary provenance information to users.
6. Reconciling data from multiple sources is non-trivial requiring substantial expertise. Substantial resource needs to be made available to support this if the databank is to be effective.
7. There is more data to be digitised than there is dedicated resource to digitise. Crowd-sourcing of digitisation should be pursued as a means to maximise data recovery efficiency. This would very likely be most efficiently achieved through a technological rather than academic or institutional host. It should be double keyed and an acceptable sample check procedure undertaken.
8. A parallel effort as an integral part of establishing the databank is required to create an adjunct metadata databank that as comprehensively as feasible describes known changes in instrumentation, observing practices and siting at each site over time. This may include photographic evidence, digital images and archive materials but the essential elements should be in machine-readable form.
9. Development may be needed of formalized by new WMO arrangements, similar to those used in the marine community, to facilitate more efficient exchanges of historical and contemporary land station data and metadata (including possibilities for further standardization).
10. In all aspects these efforts must build upon existing programs and activities to maximise efficiency and capture of current knowledge base. This effort should be an enabling and coordination mechanism and not a replacement for valuable work already underway.
Monday, July 26, 2010
White paper 8 - Creation of quality controlled homogenised datasets from the databank
The creation of quality controlled homogenised datasets from the databank white paper is now available for discussion. When making posts please remember to follow the house rules. Please also take time to read the full pdf before commenting and where possible refer to one or more of section titles, pages and line numbers to make it easy to cross-reference your comment with the document.
The recommendations are reproduced below:
• To use daily maximum/minimum temperature as the ‘base’ data set to which adjustments are made, with data at monthly and longer timescales derived from the daily data (adjusted where appropriate) rather than adjusted separately.
• To ensure that all detection and adjustment of inhomogeneities is fully documented, allowing reassessments to be made in the future (e.g. if new techniques are developed or previously unknown data or metadata become available).
• To carry out an objective evaluation of known methods for homogenisation/adjustment, in collaboration with the COST action;
• To establish a testbed of data for this purpose (see white paper 9);
• To seek to ensure that all sources of uncertainty are well quantified and defined.
The recommendations are reproduced below:
• To use daily maximum/minimum temperature as the ‘base’ data set to which adjustments are made, with data at monthly and longer timescales derived from the daily data (adjusted where appropriate) rather than adjusted separately.
• To ensure that all detection and adjustment of inhomogeneities is fully documented, allowing reassessments to be made in the future (e.g. if new techniques are developed or previously unknown data or metadata become available).
• To carry out an objective evaluation of known methods for homogenisation/adjustment, in collaboration with the COST action;
• To establish a testbed of data for this purpose (see white paper 9);
• To seek to ensure that all sources of uncertainty are well quantified and defined.
White paper 13 - Publication, collation of results, presentation of audit trails
The publication, collation of results, presentation of audit trails white paper is now available for discussion. When making posts please remember to follow the house rules. Please also take time to read the full pdf before commenting and where possible refer to one or more of section titles, pages and line numbers to make it easy to cross-reference your comment with the document.
The recommendations are reproduced below:
1. When releasing data products, we would recommend that the following information must be provided about process of product generation, apart from data itself for the product to be considered an output of the project:
(1) A listing of the source data (databank version, stations, period) along with methodological rationale
(2) A file describing the quality control method and quality-control metadata flags;
(3) Homogeneous and / or gridded version of the data;
(4) Quality assessment report produced by running against at least a minimum set of the common test cases described in previous white papers;
(5) A published paper based on the data construction method and related products in the peer reviewed press in a journal recognized by ISI;
(6) Publication of an audit trail describing all intermediate processing steps and with a strong preference to inclusion of the source code used..
2. Datasets should be served or at the very least mirrored in a common format through a common portal akin to the CMIP portal to improve their utility.
3. Utility tools should be considered that manipulate these data in ways that end users wish.
The recommendations are reproduced below:
1. When releasing data products, we would recommend that the following information must be provided about process of product generation, apart from data itself for the product to be considered an output of the project:
(1) A listing of the source data (databank version, stations, period) along with methodological rationale
(2) A file describing the quality control method and quality-control metadata flags;
(3) Homogeneous and / or gridded version of the data;
(4) Quality assessment report produced by running against at least a minimum set of the common test cases described in previous white papers;
(5) A published paper based on the data construction method and related products in the peer reviewed press in a journal recognized by ISI;
(6) Publication of an audit trail describing all intermediate processing steps and with a strong preference to inclusion of the source code used..
2. Datasets should be served or at the very least mirrored in a common format through a common portal akin to the CMIP portal to improve their utility.
3. Utility tools should be considered that manipulate these data in ways that end users wish.
White paper 9 - Benchmarking homogenisation algorithm performance against test cases
The benchmarking homogenisation algorithm performance against test cases white paper is now available for discussion. When making posts please remember to follow the house rules. Please also take time to read the full pdf before commenting and where possible refer to one or more of section titles, pages and line numbers to make it easy to cross-reference your comment with the document.
The recommendations are reproduced below:
• Global pseudo-data with real world characteristics
• GCM or Reanalyses data should be used as source base with real spatial, temporal and climatological characteristics applied to recreate to a reasonable approximation the observational record statistics
• Review of inhomogeneity across the globe finalised via a session at an international conference (link with White Paper 8) to ensure plausibility of discontinuity test cases
• Suite of ~10 discontinuity test cases that are physically based on real world inhomogeneities and orthogonally designed to maximise the number of objective science questions that can be answered
• Benchmarking to rank homogenisation algorithm skill in terms of performance using climatology, variance and trends calculated from homogenous pseudo-data and inhomogenous data (discontinuity test cases applied) (skill assessment to be synchronised with broader efforts discussed in White Paper 10)
• Independent (of any single group of dataset creators) pseudo-data creation, test case creation and benchmarking
• Peer-reviewed publication of benchmarking methodology and pseudo-data with discontinuity test cases but ‘solutions’ (original homogenous pseudo-data) to be withheld
The recommendations are reproduced below:
• Global pseudo-data with real world characteristics
• GCM or Reanalyses data should be used as source base with real spatial, temporal and climatological characteristics applied to recreate to a reasonable approximation the observational record statistics
• Review of inhomogeneity across the globe finalised via a session at an international conference (link with White Paper 8) to ensure plausibility of discontinuity test cases
• Suite of ~10 discontinuity test cases that are physically based on real world inhomogeneities and orthogonally designed to maximise the number of objective science questions that can be answered
• Benchmarking to rank homogenisation algorithm skill in terms of performance using climatology, variance and trends calculated from homogenous pseudo-data and inhomogenous data (discontinuity test cases applied) (skill assessment to be synchronised with broader efforts discussed in White Paper 10)
• Independent (of any single group of dataset creators) pseudo-data creation, test case creation and benchmarking
• Peer-reviewed publication of benchmarking methodology and pseudo-data with discontinuity test cases but ‘solutions’ (original homogenous pseudo-data) to be withheld
White paper 5 - Data policy
The data policy white paper is now available for discussion. When making posts please remember to follow the house rules. Please also take time to read the full pdf before commenting and where possible refer to one or more of section titles, pages and line numbers to make it easy to cross-reference your comment with the document.
The recommendations are reproduced below:
1. Enhance data availability
a. Build a central databank in which both the original temperature observations as well as multiple versions of the value-added datasets, i.e., quality controlled, homogenized and gridded products, are stored and documented together (including version control). The opportunity to repeat any enhanced analysis should exist. Not only will the methods used for adding value change over time and between scientists, but the data policy will change as well.
b. Provide support for digitization of paper archives wherever they may exist with the proviso that any data (and metadata) digitized under this program be made available to the central databank.
c. Enhance the international exchange of climate data by linking this activity to joint projects of global and regional climate system monitoring and by promoting the free and open access of existing databanks in accordance with set principles, e.g., those of the GEO.
2. Enhance derived product availability
a. Accept that there is a trade off between transparency and data quantity used for derived products. Transparency and openness, which scientists (including the authors) advocate, are hampered by the data policies of national governments and their respective NMHSs. Data policy issues are persistent and unlikely to change in the near future.
b. Hold a series of workshops to homogenize data and produce a gridded dataset. The original and adjusted data might not be able to be released but the gridded dataset and information on the stations that contributed to each grid box value would be released. These gridded datasets could be used by NMHSs to monitor their climate and fit together seamlessly into a global gridded dataset.
c. Ensure that the datasets are correctly credited to their creators and that related rights issues on the original data and the value-added products are observed and made clear to potential users. The conditions will be different for bona fide research and commercial use of data.
3. Involve NMHSs from all countries
a. Acknowledge that involvement of data providers (mainly NMHSs) from countries throughout the world is essential for success, and involves more than simply sending the data to an international data centre. For all nations contributing station records to benefit from this exercise, the scientific community needs to also deliver derived climate change information which can be used to support local climate services by the NMHSs. This return of investment is of particular importance for developing countries.
b. Adopt an end-to-end approach in which data providers are engaged in the construction and use of value-added products, not only because it is at the local level where the necessary knowledge resides on the procedures and circumstances under which the observations have been made, but also because this will make it easier to overcome access restrictions to the original data.
c. Increase the pressure on those countries not inclined to follow a more open data policy by engaging with institutions widely beyond the community of research scientists, including funding bodies, the general public, policy makers and international organisations.
The recommendations are reproduced below:
1. Enhance data availability
a. Build a central databank in which both the original temperature observations as well as multiple versions of the value-added datasets, i.e., quality controlled, homogenized and gridded products, are stored and documented together (including version control). The opportunity to repeat any enhanced analysis should exist. Not only will the methods used for adding value change over time and between scientists, but the data policy will change as well.
b. Provide support for digitization of paper archives wherever they may exist with the proviso that any data (and metadata) digitized under this program be made available to the central databank.
c. Enhance the international exchange of climate data by linking this activity to joint projects of global and regional climate system monitoring and by promoting the free and open access of existing databanks in accordance with set principles, e.g., those of the GEO.
2. Enhance derived product availability
a. Accept that there is a trade off between transparency and data quantity used for derived products. Transparency and openness, which scientists (including the authors) advocate, are hampered by the data policies of national governments and their respective NMHSs. Data policy issues are persistent and unlikely to change in the near future.
b. Hold a series of workshops to homogenize data and produce a gridded dataset. The original and adjusted data might not be able to be released but the gridded dataset and information on the stations that contributed to each grid box value would be released. These gridded datasets could be used by NMHSs to monitor their climate and fit together seamlessly into a global gridded dataset.
c. Ensure that the datasets are correctly credited to their creators and that related rights issues on the original data and the value-added products are observed and made clear to potential users. The conditions will be different for bona fide research and commercial use of data.
3. Involve NMHSs from all countries
a. Acknowledge that involvement of data providers (mainly NMHSs) from countries throughout the world is essential for success, and involves more than simply sending the data to an international data centre. For all nations contributing station records to benefit from this exercise, the scientific community needs to also deliver derived climate change information which can be used to support local climate services by the NMHSs. This return of investment is of particular importance for developing countries.
b. Adopt an end-to-end approach in which data providers are engaged in the construction and use of value-added products, not only because it is at the local level where the necessary knowledge resides on the procedures and circumstances under which the observations have been made, but also because this will make it easier to overcome access restrictions to the original data.
c. Increase the pressure on those countries not inclined to follow a more open data policy by engaging with institutions widely beyond the community of research scientists, including funding bodies, the general public, policy makers and international organisations.
Wednesday, July 14, 2010
Welcome and house rules
This blog will host the series of white papers for discussion at the workshop to be held at the UK Met Office in September 2010. More details are available from http://www.surfacetemperatures.org. The white papers will be posted on or around July 27th and be open for public comment for approximately 4 weeks. Update 8/19: Comments will remain open until 9/1.
All posts will be moderated and will be processed at 8am EDT each weekday and at no other time. Posts deemed by the blog owner to breach any of these guidelines, even in part, will be deleted. Posts that will be accepted will be scientifically relevant, on topic, concise and constructive in tone. Any of the following will lead to post rejection:
The purpose of the blog is to get relevant input from everybody as clearly we can only invite so many people to the meeting itself and there are many more scientifically relevant viewpoints out there. We welcome comments from anyone regardless of qualifications who believes they have something positive, constructive and relevant to provide. We will ensure the comments are considered within the September workshop.
All posts will be moderated and will be processed at 8am EDT each weekday and at no other time. Posts deemed by the blog owner to breach any of these guidelines, even in part, will be deleted. Posts that will be accepted will be scientifically relevant, on topic, concise and constructive in tone. Any of the following will lead to post rejection:
- Off topic posts or posts with too much off-topic content
- Defamatory language or insinuations against others
- Overt linkage to non-relevant resources (very limited and necessary links to strictly relevant scientific papers or scientific web pages allowed but not any links to blogs)
- Comments on the reality or otherwise of global warming and political ramifications - there are plenty of other blogs out there to make these comments on.
- Comments judged to be too long.
The purpose of the blog is to get relevant input from everybody as clearly we can only invite so many people to the meeting itself and there are many more scientifically relevant viewpoints out there. We welcome comments from anyone regardless of qualifications who believes they have something positive, constructive and relevant to provide. We will ensure the comments are considered within the September workshop.
Subscribe to:
Posts (Atom)