Thursday, December 15, 2011

Initiative overview paper published in the Bulletin of the American Meteorological Society

The first peer-reviewed paper describing the end to end envisaged International Surface Temperature Initiative scope has been published this week by the Bulletin of the American Meteorological Society. It is an Open Access paper available at  Any comments, queries or offers of effort are very welcome either through this blog or the general enquiries email Updates since the paper submission can be found at www,

Thursday, November 10, 2011

GHCN-M v3.1.0 – showing the value of engaging with software engineers

NCDC have just released version 3.1.0 of the GHCN product, detailed in a tech note, as documented in the dataset paper of their global Land Surface Air Temperature product – the Global Historical  Climatology Network Monthly. This release does two things.

Firstly it incorporates an array processing algorithm that significantly speeds up the processing which will enable NCDC to process the much larger databank holdings upon its first version release in early-to-mid 2012 to form a yet more comprehensive estimate of the global Land Surface Air Temperature evolution.

Secondly, and the focus of this post, is that it incorporates a set of five process bug fixes, four of which were discovered in the homogenization algorithm as a result of an effort undertaken by Daniel Rothenberg sponsored by the Google Summer of Code and mentored by the Climate Code Foundation. The final bug was discovered as a result of carefully checking for similarly based bugs which essentially related to array compression / non-compression for missing values on passing between routines. That bugs exist in what is several thousand lines of code is hardly surprising. In fact it would have been far more surprising if it had been discovered that there were no bugs. Daniel visited NCDC as part of his project and the bugs were discussed at length with relevant NCDC staff and fixes have subsequently been undertaken, extensively validated, and their impacts on the analysis documented.

The bottom line impact on the global mean trend is a difference of less than 0.002K/decade – below the typically quoted global mean estimate precision of 2 decimal places and two orders of magnitude less than the reported centennial scale global-mean Land Surface Air Temperature warming rate from this dataset. Equally global annual means show negligible differences. Differences at the station level are almost always below 0.2K/decade with effectively zero mean change. So, whilst the bug fixes were important from both a science and process perspective they do not significantly alter our current understanding of changes in climate at the largest space and longest timescales.

What this does provide is an example of the very real potential value in openness and transparency, in code replication, and in working in positive partnership to resolve the issues that arise. Daniel aims to continue working on his port of the algorithm to python and it will be of great interest to see what other benefits may accrue.

NCDC have released the old (v.3.0.0) and new (v.3.1.0) versions of the homogenized data (in frozen form) and other relevant metadata (with ongoing additions) at

Monday, October 31, 2011

WCRP OSC thoughts

WCRP OSC was a very large conference, mainly poster based. The sheer volume of posters was over-whelming. Plenary talks were generally very good, whereas the parallel sessions were a mixed bag with a number of real gems. Being talked at for ten hours a day is too long though and interest inevitably wanes. Wireless connections certainly aren't a help in that regard allowing people to attend without really truly being in attendance. I was presenting four posters across two sessions (one on my other 'hobby' - the GCOS Reference Upper Air Network) on the Tuesday morning and giving a talk on the surface temperature initiative the Tuesday afternoon.

I warmed up for this by asking a question in the c.2000 attendee observations plenary session first thing on Tuesday - nerve wracking in its own right. Two of the plenary speakers had bemoaned the lack of agreement between estimates for many variables and stated to be ‘scared’. I pointed out that this was an inevitable consequence of making measurements that were not traceable to measurement standards and that I was instead encouraged to see multiple estimates as this was the only way we could ascertain what could / could not be said. None of the speakers responded so either it was an awful point to make or they did not wish to respond.

I spent the majority of the poster time around the three surface temperature initiative posters. Like many of the posters they were in a corner but there was still reasonable interest and a number of potential data leads were identified. Roughly half of the 50 data request cover letters and data submission guidelines hardcopies were taken. Most of the discussants were supportive although inevitably some raised the Berkeley effort and whether this now obviated the need for the initiative as a whole. This gave an opportunity to clarify the holistic nature of the enterprise and how the Berkeley effort, if published(!), would simply constitute one important contributing component. It was stressed that science and society are interested in more than the global centennial timescale trend and that differences would be greater at smaller space and timescales. It was also stressed that consistent benchmarking was necessary to understand differences more robustly. See also Steve Easterbrook's take on the poster that he was presenting on benchmarking.

The afternoon talk was given in a parallel session with probably 300-500 people (it felt like the latter!) in attendance. It was a little bit rabbit in the headlights for the first half although better towards the end. There were at least three (maybe four) questions from the floor. There were then several people who had questions after the end of the session that kept me busy for the full half hour coffee break and beyond. These gave a chance for a much smaller audience to expand on various aspects – especially crowdsourcing. Questions regarding whether data holdings known to a given individual were already there highlighted the need to clarify that we wished to get hold of any and all data and that the databank processing will be designed to account for such redundancy in an open and transparent way. Regardless, a concatenated master-list of current holdings at stage 2 level was requested. This has now been added to the databank prototype.

Tuesday, October 18, 2011


Or acronym soup?

For those who follow this effort and will be attending next week's WCRP OSC conference in Denver, Colorado, we will have a talk in Session B4 (Tuesday @15.00) and four posters in Session C13 (Tuesday morning). The posters are also posted at and the oral presentation will be uploaded there also after the event. Please do pop by and say hello.

Monday, October 17, 2011

ISTI Meeting Reports: 4th ACRE Workshop and GCOS Steering Committee meeting

I've just got back from presenting the work of the International Surface Temperature Initiative at both the 4th ACRE Workshop (Utrecht, Netherlands) and to the GCOS Steering Committee meeting (ECMWF, UK). The presentations and meeting reports are now hosted at:


Wednesday, July 27, 2011

You may have noticed we have a logo now ...

Things are also starting to take shape in revamping the website. Any comments on this and suggestions as to how to make more useful gratefully appreciated.

Thursday, July 14, 2011

Overarching implementation plan published

We have today published an implementation plan for the initiative as a whole. The plan focuses primarily upon the steps necessary to complete the first databank version and benchmarking and assessment cycle. Comments upon this document are welcome.

Tuesday, April 26, 2011

Steering committee terms of reference and meeting minutes

The latest steering committee minutes are available along with a first version of their terms of reference. Forthcoming soon will be an Implementation Plan and terms of reference for sub-groups. This may all seem incredibly boring (and it generally is) but it is also absolutely necessary for the initiative to function properly if it is to be a successful multi-person, multi-institution, multi-year effort. Comments welcome as always.

Wednesday, April 6, 2011

Prototype for the databank publicly available

A very initial version of the envisaged global surface databank is available from It should be stressed that this is in the very early stages of development. A full version release is not expected until early to mid 2012. This allows time to harvest additional data sources, reprocess, merge, and add provenance information so that it represents a significant delta from what has gone before in terms of both completeness and fundamental scientific value. Comments are welcome here but would be more appropriate at the new databank blog.

Tuesday, April 5, 2011

Two new initiative related blogs

There are two new more working level blogs that have been set up recently. These are more technical discussion areas than this blog. Both allow working group members to add posts and comments (moderated) are allowed from anybody else. covers work towards a global surface databank. If you know of data sources please head on over and provide leads in the post comments. covers work towards a set of benchmark analogs to the databank that algorithm creators can run their algorithms on to ascertain both absolute and relative performance.

Wednesday, March 23, 2011

Data provenance and versioning task team

There is a data provenance and versioning task team for the databank effort - details are available from This task team has now met twice by telephone and made some initial inroads into the problem.

More generally the main site continues to be updated with progress as it happens. Other commitments have meant many of these have been left not noted here. My apologies.

Friday, January 28, 2011

Data bank task team on data rescue set up

Further details can be found here.

For anyone wondering how they can help to improve the data holdings right now, you might want to take a look at either for upper air and land surface records, or for world war 1 UK ship records. It is hoped that many of the substantial land data holdings currently available only in image / hard copy form can eventually be digitized by citizen scientists over the internet.

Second teleconference of steering committee

Notes are posted here. Comments welcome.