Thursday, October 4, 2012

What known issues remain to be addressed with the databank during beta release?

      First and foremost, this beta release affords the opportunity for broader community input to the databank process. Having many eyes on the prize, hands turning over the rocks and boots kicking the tires will make this thing better.  So, there will undoubtedly be a number of issues in addition to those highlighted here that will arise.  We welcome such feedback. There are a number of issues which we know we will address during beta:
  1. Where we have daily sources we will append provenance flags which specify how many daily reports went into each monthly value. This may prove useful to analysts down the line. Where we only have monthlies we will append this flag as a missing value indicator
  2. We will create a version which re-injects the element-wise provenance flags from the constituent stage 2 (source deck) holdings into stage 3 (merged) holdings. For computational efficiency it is impossible to carry the flags through the merge program, but, obviously, information is available to do this as post-processing.
  3. We are well aware that some stations will have poor geo-location metadata (i.e. Spanish stations in the Sahara or older station segments using a different meridian to Greenwich). At present no blacklisting is applied. But we will definitely apply such blacklisting to the first version merge. We are still in the process of collating a list of known geo-location errors to apply such a fix. One of two things will be done: a correction to the geo-location data where this is known and generally accepted; or force the code to withhold the station from the merge. If you find an apparent issue with a station’s location please let us know either through the blog or so that we can investigate and determine what to do.
  4. We plan to release all the stage 1 to stage 2 conversion scripts for completeness. This will be done soon. There are only so many hours in the day and getting the merge in order has been the priority.
  5. We plan to create several output formats to aid usability. These are hoped to include cf compliant netcdf. For now the databank is made available in two ASCII-based formats.
  6. Instigation of a consistent station identifier system that is robust to future data additions and deletions and is consistent with daily identifiers moving forwards.

