Friday, September 12, 2014

The Databank Near Real Time Update System

Since the official release back in June, we have worked to keep the databank updated with the most recent data. Each month we will post new data from sources that update in near-real-time (NRT), along with an updated version of the recommended merge with the latest data appended. Stage 1 data (digitized in its original form) will be updated no later than the 5th of each month, and then Stage 2 (common formatted data) and Stage 3 (merged record) data will be updated no later than the 11th of the month.

So what data gets updated in our NRT system? We have determined four sources that have updated data within the first few days of the month. They are the CLIMAT streams from NCDC as well as the UK, the unpublished form of the monthly climatic data for the world (MCDW) and finally GHCN-D. Similar to the merge program, a hierarchy is placed determining which source its data appends to if there are conflicts. The hierarchy is here:

1) GHCN-D
2) CLIMAT-UK
3) CLIMAT-NCDC
4) MCDW-Unpublished

An overview of the system is shown here in this flow diagram (Click on image to enlarge):

The algorithm to append data looks for station matches through the same metadata tests as described in the merge program. These include geographic distance, height distance, and station name similarity using the Jaccard Index. If the metadata metric is good, then an ID test is used to determine station match. Because the four input sources have either a GHCN-D or WMO ID, the matching is much easier here than in the merge program. Once a station match is found, new data from the past few months are appended. Throughout this process, no new stations are added.

We have had two monthly updates so far. As always the latest recommended merge data can be found on our ftp page here, along with older data placed in the archive here. Note that we are only updating the recommended merge, and not the variants. In addition, the merge metadata is not updated, because no new merge has been applied yet. We plan to have another merge out sometime in early 2015.