Friday, March 22, 2013

Initiative progress report published

The initiative progress report has been published and shared with our 'sponsors'. This provides a useful overview of what has been acheived and what is intended to occur in the next year. Its been delayed due to demands on folks time. But still, better late than never. Comments and feedback are welcome. All progress reports are archived at

Monday, March 18, 2013

Databank Release: Beta #3

We are nearing an official version 1 release of the global land surface databank. However, because there have been major changes since the last beta release in December, it seemed adequate to push out one more beta for the public to provide any comments.

The beta3 release can be found here: Within that directory one can find all the data and code used, along with some graphics depicting the results of all the merge variants.

In addition, the previous betas are still available to look at, if anyone wishes to run comparisons
The next couple of posts will highlight changes and additions to this beta release, however here are the highlights:
  • A blacklist of candidate stations was generated to either fix known errors with its metadata/data, or withhold the station completely. This is a required input file for the code to run and is provided with this beta release
  • Some minor code changes were applied, including withholding stations when the metadata probability was near perfect, but the data comparisons were so poor the station became unique (when it should have merged). In addition, odd characters were removed from the station name before the Jaccard Index was run.
  • The format of stage 3 data was changed so that it was consistent with all stage 2 data. In addition, all data provenance flags have been ported over in order to be open and transparent
  • Algorithm output is included with each variant result, in order to provide information about each candidate station and how it made it's decision to merge / unique / withhold. A future post will go into great detail about each output file.
As usual these are not considered the final revisions prior to an official Version 1 release. In addition all documentation provided on the FTP site will be superseded with a published version of the databank merge methodology paper, which we are working hard to submit to a peer-reviewed journal soon.

If you wish to provide comments, please feel free to send an e-mail to