Saturday, February 28, 2015

Promotional flyers now available in French

International volunteers are helping to translate the promotional materials recently distributed at the COP meeting in Lima into additional languages. These will be made available through as they become available. Please distribute and use to promote the Initiative's aims and objectives at relevant venues and meetings.

With thanks to Lucie Vincent of Environment Canada and the graphics team at NOAA's National Climatic Data Center versions in French are now available.

Tuesday, January 20, 2015

Because the POSTman always delivers ...

We recently had a full teleconference meeting of participants. If you are prone to insomnia the full minutes are available at this link.

The major news is that, after some discussions on the appropriate name for the group ISTI does, indeed, have a new group ... the Parallel Observations Science Team (or POST) led by Victor Venema and Renate Auchmann.

You may recall a number of posts on this subject over at Victor's place. We shall work with colleagues to help further this effort. By being part of the formal ISTI family we will ensure that benefits regarding data holdings, benchmarking, and lessons learnt from this effort are more broadly shared. We always look for win-wins!

We are still looking at populating the parallel measurements database so if you know of any coincident measurements using distinct techniques or looking at spatial variability at the local scale (or both) then please do get in contact. Victor and Renate are also still populating this group (terms of reference here) so if parallel measurements are of interest and you feel you could contribute drop them a line.

More details on this effort can be found at

Thursday, January 1, 2015

Survey on national homogenised temperature data sets

I've recently run a survey on national homogenised temperature data sets. Whilst this was not an exhaustive survey (as indicated by the number of responses), it is an indication of what's out there and what resources various countries are putting into this work.

Survey reports were received from 18 countries (CHN, CAN, ISR, IRL, SUI, SLO, NOR, HUN, NED, ROM, GBR, AUT, SRB, ESP, CZE, SWE, UKR, AUS) and 1 region (Catalonia). Summary results were as follows:

1. Number of staff involved in homogenisation (full-time equivalent)

Less than 1                                          2 countries
1-2                                                       9
2-4                                                       5
4 or more                                             3

(global and continental data sets are excluded from this - for example, the UK have several people working on the HadCRUT data sets, and the Netherlands on ECA&D and associated projects)

2. Existence of a national homogenised data set

Yes                                                                                         16
Yes but not yet released                                                         1
No national set but a station/regional set                              1
No                                                                                          1

3. Time resolution of data set

Daily                                                                                      8
Monthly                                                                                 7
Mix depending on element                                                    1
Monthly for early data, daily for later                                   1

4. Time resolution of adjustment

Results from this are a little unclear – several responses indicated use of the Vincent methodology, which interpolates adjustments based on monthly values to daily timescales.

Daily                                                                                      4
Monthly                                                                                 11
Monthly for detection, daily for adjustment                          2

5. Elements included

Maximum, minimum and mean temperature                        8
Maximum and minimum temperature                                   5
Mean temperature only                                                          4

(note that ‘maximum and minimum temperature’ implies mean temperature is not homogenised independently – in most cases it can still be calculated based on max/min)

6. Frequency of updating/reassessing homogeneity

Not updated                                                                      6 (in 2 cases, the first data set has only just
                                                                                              been completed)
Appended with unadjusted data only                               2
Irregularly                                                                         1
Annually or near-annually                                                4
Intervals longer than 2 years                                            4 (ranging from every 3 to every 10 years)

Thursday, December 11, 2014

Why we need max and min temperatures for all stations

I'm doing an analysis of Diurnal Temperature Range (DTR; more on that when published) but as part of this I just played with a little toy box model and the result is sufficiently of general interest to highlight here and maybe get some feedback.

So, for most stations in the databank we have data for maximum (Tx) and minimum (Tn) that we then average to get Tm. Now, that is not the only transform possible - there is also DTR which is Tx-Tn. Although that is not part of the databank archive its a trivial transform. In looking at results running NCDC's pairwise algorithm distinct differences in breakpoint detection efficacy and adjustment distribution arise, which have caused great author team angst.

This morning I constructed a simple toy box where I just played what if. More precisely what if I allowed seeded breaks in Tx and Tn in the bound -5 to 5 and considered the break size effects in Tx, Tn, Tm and DTR:
The top two panels are hopefully pretty self explanatory. Tm and DTR effects are orthogonal which makes sense. In the lowest panel (note colours chosen from colorbrewer but please advise if issues for colour-blind folks):
red: Break largest in Tx
blue: Break largest in Tn
purple: break largest in DTR
green: break largest in Tm (yes, there is precisely no green)
Cases with breaks equal in size are no colour (infintesimally small lines along diagonal and vertices at Tx and Tn =0)

So …

if we just randomly seeded Tx and Tn breaks in an entirely uncorrelated manner into the series then we would get 50% of breaks largest in DTR and 25% each in Tx and Tn. DTR should be broader in its overall distribution and Tm narrower with Tx and Tn intermediate.

if we put in correlated Tx and Tn breaks such that they were always same sign (but not magnitude) then they would always be largest in either Tx or Tn (or equal with Tm when Tx=Tn)

If we put in anti-correlated breaks then they would always be largest in DTR.

Perhaps most importantly, as alluded to above, breaks will only be equal largest for Tm in a very special set of cases where Tx break = Tn break. Breaks, on average will be smallest in Tm. If breakpoint detection and adjustment is a signal to noise problem its not sensible to look where the signal is smallest. This has potentially serious implications for our ability to detect and adjust for breakpoints if we limit ourselves to Tm and is why we should try to rescue Tx and Tn data for the large amount of early data for which we only have Tm in the archives.

Maybe in future we can consider this as an explicitly joint estimation problem of finding breaks in the two primary elements and two derived elements and then constructing physically consistent adjustment estimates from the element-wise CDFs. Okay, I'm losing you now I know so I'll shut up ... for now ...


Bonus version showing how much more frequently DTR is larger than Tm:


Tuesday, December 9, 2014

What has changed since the version 1 release of the Databank?

It has been nearly six months since we have released the first version of the databank. While this was a big achievement for the International Surface Temperature Initiative, our work is not done. We have taken on many different tasks since the release, and a brief description is below:

Monthly Update System
As described in this post, we have implemented a monthly update system appending near real time (NRT) data into the databank. On the 5th of each month 4 sources (ghcnd, climat-ncdc, climat-uk, mcdw-unpublished) update their Stage 1 data, and on the 11th, their common formatted data (Stage 2) are then updated. In addition, an algorithm is applied appending new data to the recommended merge, and that is updated on the 11th as well.

Bug Fixes
Users have submitted some minor issues with version 1. Some stations in Serbia were given a country code of "RB" when they should have been given "RI." These have been addressed, and a new version of the databank (v1.0.1) was released.

There have been concerns about how the station name is displayed. Non-ASCII characters pose problems with some text interpreters. A module has been created in the Stage 1 to Stage 2 conversion scripts where these characters are either changed or removed to avoid this problem in the future.

Of course issues could still exist, if you find any please let us know! As an open and transparent initiative, we encourage constructive criticism and will apply any reasonable suggestions to future versions.

New Sources
We have acquired new sources that will be added as Stage 1 and Stage 2 data soon, including
  • 300 UK Stations from the Met Office
  • German data released by DWD
  • EPA's Oregon Crest to Coast Dataset
  • LCA&D: Latin American Climate Assessment and Dataset
  • Daily Chinese Data
  • NCAR Surface Libraries
  • Stations from Meteomet project
  • Libya Stations sent by their NMS
  • C3/EURO4M Stations
  • Additional Digitized Stations from the University of Giessen
  • Homogenized Iranian Data
It is not too late to submit new data. If you have a lead on sources please let us know at We will freeze the sources again on February 28th, 2015, in order to work on the next version of the merge.

Friday, December 5, 2014

Discovering NCDC's hard copy holdings

Update Dec 11th: permanent link with some browser issues resolved at

NOAA's National Climatic Data Center have undertaken an inventory of their substantial basement holdings of hard copy data. These include a rich mix of data types on varied media including paper, fiche and microfilm.

One row of several dozen in the NCDC archive of hard copy paper holdings from around the world

Microfilm holdings arising from Europe over the second world war

Some, but far from all, of this data has been imaged and / or digitized. NCDC have now released the catalogue online and made it searchable. The catalogue interface can be found at (click on search records). The degree to which a given holding has been catalogued varies but this is a good place to at least begin to ascertain what holdings there are there and what their status is. For example searching on American Samoa as country provides a list of holdings most of which are hard copy only.
Example search results for American Samoa
For those interested in aspects of data rescue, this is likely to be a useful tool to ascertain whether NCDC hold any relevant records. By reasonable estimates at least as much data exists in hard copy / imaged format as has been digitised for the pre-1950 period. That is a lot of unknown knowns and could provide such rich information to improve understanding ...

Wednesday, November 26, 2014

A set of flyers for promoting the initiative's aims and outcomes

We have produced a set of one-sider flyers to promote the initiative and its aims and to try to engender additional inputs, collaborations and contributions. These will be taken by Kate Willett to the forthcoming COP meeting in Peru next month.

We strongly encourage use of these flyers at appropriate venues to support the further advancement of our work.

The set of flyers can be found at There are flyers on:
[links are to pdf versions]

Our more eagle eyed readers would have noted above a new strand to our work. I am delighted to say that we have, following the most recent steering committee call, formally recognized the efforts led by Victor Venema and Renate Auchmann to populate and exploit a database of parallel measurements by instigating a new expert team under the databank working group. We shall do all we can to support this important effort and in the first instanace we encourage readers to help us in the identification and collection of such holdings.

A stub page is available at which we shall populate over the coming months. In the meantime more information on this effort can be found at

Wednesday, November 5, 2014

Release of a daily benchmark dataset - version 1

The ISTI benchmark working group includes a PhD student looking at benchmarking daily temperature homogenisation algorithms. This largely follows the concepts laid out in the benchmark working group's publication. Significant progress has been made in this field. This post announces the release of a small daily benchmark dataset focusing on four regions in North America. These regions can be seen in Figure 1. 

Figure 1 Station locations of the four benchmark regions. Blue stations are in all worlds. Red stations only appear in worlds 2 and 3.

These benchmarks have similar aims to the global benchmarks that are currently being produced by the ISTI working group, namely to:

  1. Assess the performance of current homogenisation algorithms and provide feedback to allow for their improvement 
  2. Assess how realistic the created benchmarks are, to allow for improvements in future iterations 
  3. Quantify the uncertainty that is present in data due to inhomogeneities both before and after homogenisation algorithms have been run on them

A perfect algorithm would return the inhomogeneous data to their clean form – correctly identifying the size and location of the inhomogeneities and adjusting the series accordingly. The inhomogeneities that have been added will not be made known to the testers until the completion of the assessment cycle – mid 2015. This is to ensure that the study is as fair as possible with no testers having prior knowledge of the added inhomogeneities.

The data are formed into three worlds, each consisting of the four regions shown in Figure 1. World 1 is the smallest and contains only those stations shown in blue in Figure 1, Worlds 2 and 3 are the same size as each other and contain all the stations shown.

Homogenisers are requested to prioritise running their algorithms on a single region across worlds instead of on all regions in a single world. This will hopefully maximise the usefulness of this study in assessing the strengths and weaknesses of the process. The order of prioritisation for the regions is Wyoming, South East, North East and finally the South West.

This study will be more effective the more participants it has and if you are interested in participating please contact Rachel Warren (rw307 AT The results will form part of a PhD thesis and therefore it is requested that they are returned no later than Friday 12th December 2014. However, interested parties who are unable to meet this deadline are also encouraged to contact Rachel.

There will be a further smaller release in the next week that is just focussed on Wyoming and will explore climate characteristics of data instead of just focusing on inhomogeneity characteristics.

Monday, October 6, 2014

A framework for benchmarking of homogenisation algorithm performance on the global scale - Paper now published

The ISTI benchmarking working group have just had their first benchmarking paper accepted at Geoscientific Instrumentation, Methods and Data Systems:

Willett, K., Williams, C., Jolliffe, I. T., Lund, R., Alexander, L. V., Brönnimann, S., Vincent, L. A., Easterbrook, S., Venema, V. K. C., Berry, D., Warren, R. E., Lopardo, G., Auchmann, R., Aguilar, E., Menne, M. J., Gallagher, C., Hausfather, Z., Thorarinsdottir, T., and Thorne, P. W.: A framework for benchmarking of homogenisation algorithm performance on the global scale, Geosci. Instrum. Method. Data Syst., 3, 187-200, doi:10.5194/gi-3-187-2014, 2014.

Benchmarking, in this context, is the assessment of homogenisation algorithm performance against a set of realistic synthetic worlds of station data where the locations and size/shape of inhomogeneities are known a priori. Crucially, these inhomogeneities are not known to those performing the homogenisation, only those performing the assessment. Assessment of both the ability of algorithms to find changepoints and accurately return the synthetic data to its clean form (prior to addition of inhomogeneity) has three main purposes:

      1) quantification of uncertainty remaining in the data due to inhomogeneity
      2) inter-comparison of climate data products in terms of fitness for a specified purpose
      3) providing a tool for further improvement in homogenisation algorithms

Here we describe what we believe would be a good approach to a comprehensive homogenisation algorithm benchmarking system. Thfis includes an overarching cycle of: benchmark development; release of formal benchmarks; assessment of homogenised benchmarks and an overview of where we can improve for next time around (Figure 1).

Figure 1 Overview the ISTI comprehensive benchmarking system for assessing performance of homogenisation algorithms. (Fig. 3 of Willett et al., 2014)

There are four components to creating this benchmarking system. 

Creation of realistic clean synthetic station data

Firstly, we must be able to synthetically recreate the 30000+ ISTI stations such that they have the correct variability, auto-correlation and interstation cross-correlations as the real data but are free from systematic error. In other words, they must contain a realistic seasonal cycle and features of natural variability (e.g., ENSO, volcanic eruptions etc.). There must be a realistic persistence month-to-month in each station and geographically across nearby stations. 

Creation of realistic error models to add to the clean station data

The added inhomogeneities should cover all known types of inhomogeneity in terms of their frequency, magnitude and seasonal behaviour. For example, inhomogeneities could be any or a combination of the following:

     -  geographically or temporally clustered due to events which affect entire networks or regions (e.g. change in observation time);
     -  close to end points of time series;
     -  gradual or sudden;
     -  variance-altering;
     -  combined with the presence of a long-term background trend;
     - small or large;

     - frequent;
     - seasonally or diurnally varying.

Design of an assessment system

Assessment of the homogenised benchmarks should be designed with the three purposes of benchmarking in mind. Both the ability to correctly locate changepoints and to adjust the data back to its homogeneous state are important. It can be split into four different levels:

     - Level 1: The ability of the algorithm to restore an inhomogeneous world to its clean world state in terms of climatology, variance and trends.

     - Level 2: The ability of the algorithm to accurately locate changepoints and detect their size/shape.

     - Level 3: The strengths and weaknesses of an algorithm against specific types of inhomogeneity and observing system issues.

     - Level 4: A comparison of the benchmarks with the real world in terms of detected inhomogeneity both to measure algorithm performance in the real world and to enable future improvement to the benchmarks.

The benchmark cycle

This should all take place within a well laid out framework to encourage people to take part and make the results as useful as possible. Timing is important. Too long a cycle will mean that the benchmarks become outdated. Too short a cycle will reduce the number of groups able to participate.

Producing the clean synthetic station data on the global scale is a complicated task that has now taken several years but we are close to completion of a version 1. We have collected together a list of known regionwide inhomogeneities and a comprehensive understanding of the many many different types of inhomogeneities that can affect station data. We have also considered a number of assessment options and decided to focus on levels 1 and 2 for assessment within the benchmark cycle. Our benchmarking working group is aiming for release of the first benchmarks by January 2015.

Friday, September 12, 2014

The Databank Near Real Time Update System

Since the official release back in June, we have worked to keep the databank updated with the most recent data. Each month we will post new data from sources that update in near-real-time (NRT), along with an updated version of the recommended merge with the latest data appended. Stage 1 data (digitized in its original form) will be updated no later than the 5th of each month, and then Stage 2 (common formatted data) and Stage 3 (merged record) data will be updated no later than the 11th of the month.

So what data gets updated in our NRT system? We have determined four sources that have updated data within the first few days of the month. They are the CLIMAT streams from NCDC as well as the UK, the unpublished form of the monthly climatic data for the world (MCDW) and finally GHCN-D. Similar to the merge program, a hierarchy is placed determining which source its data appends to if there are conflicts. The hierarchy is here:

4) MCDW-Unpublished

An overview of the system is shown here in this flow diagram (Click on image to enlarge):

The algorithm to append data looks for station matches through the same metadata tests as described in the merge program. These include geographic distance, height distance, and station name similarity using the Jaccard Index. If the metadata metric is good, then an ID test is used to determine station match. Because the four input sources have either a GHCN-D or WMO ID, the matching is much easier here than in the merge program. Once a station match is found, new data from the past few months are appended. Throughout this process, no new stations are added.

We have had two monthly updates so far. As always the latest recommended merge data can be found on our ftp page here, along with older data placed in the archive here. Note that we are only updating the recommended merge, and not the variants. In addition, the merge metadata is not updated, because no new merge has been applied yet. We plan to have another merge out sometime in early 2015.

Friday, August 29, 2014

ccc-gistemp and ISTI

This is a guest post by David Jones of the Climate Code Foundation. It is a mirror of their post at

ccc-gistemp is Climate Code Foundation‘s rewrite of the NASA GISS Surface Temperature Analysis GISTEMP. It produces exactly the same result, but is written in clear Python.

I’ve recently modified ccc-gistemp so that it can use the dataset recently released by the International Surface Temperature Initiative. Normally ccc-gistemp uses GHCN-M, but the ISTI dataset is much larger. Since ISTI publish the Stage 3 dataset in the same format as GHCN-M v3 the required changes were relatively minor, and Climate Code Foundation appreciates the fact that ISTI is published in several formats, including GHCN-M v3.

The ISTI dataset is not quality controlled, so, after re-reading section 3.3 of Lawrimore et al 2011, I implemented an extremely simple quality control scheme, MADQC. In MADQC a data value is rejected if its distance from the median (for its station’s named month) exceeds 5 times the median absolute deviation (MAD, hence MADQC); any series with fewer than 20 values (for each named month) is rejected.

So far I’ve found MADQC to be reasonable at rejecting the grossest non climatic errors.

Let’s compare the ccc-gistemp analysis using the ISTI Stage 3 dataset versus using the GHCN-M QCU dataset. The analysis for each hemisphere:

For both hemispheres the agreement is generally good and certainly within the published error bounds.

Zooming in on the recent period:

Now we can see the agreement in the northern hemisphere is excellent. In the southern hemisphere agreement is very good. The trend is slightly higher for the ISTI dataset.

The additional data that ISTI has gathered is most welcome, and this analysis shows that the warming trend in both hemispheres was not due to choosing a particular set of stations for GHCN-M. The much more comprehensive station network of ISTI shows the same trends.

Thursday, July 24, 2014

The WMO Commission for Climatology meeting and developments on the WMO front

The World Meteorological Organization’s Commission for Climatology had its four-yearly meeting in Heidelberg, Germany, from 3-8 July, preceded by a Technical Conference from 30 June – 2 July. The Commission is the central body for climate-related activities in WMO, and has a major role in establishing international standards and setting international work programs in the climate field, particularly through setting up networks of Expert Teams and Task Teams to work on particular issues. Its  President (re-elected at the meeting) is Tom Peterson of NCDC, who will be well-known to many of you. The International Surface Temperature Initiative was set up as the result of a resolution of the last Commission for Climatology meeting, in 2010.

I made a presentation to the Technical Conference on the current status of ISTI. By happy coincidence, this presentation was scheduled for the morning on 1 July, a few hours after the release of the first version of the ISTI databank. The presentation appeared to be well-received; there were few direct questions or follow-ups, but the pile of leaflets we brought describing ISTI (once they got there, after a couple of bonus days enjoying Berlin with the rest of my luggage) was a lot smaller at the end of the week than it was at the start. One particular reason for targeting the Commission audience is that many of the attendees at Commission meetings are senior managers in their national meteorological services (often the head of the climate division, or equivalent), and so potentially have more influence over decisions to make data available to projects such as ISTI than individual scientists would.

Slow progress is also being made in two other areas of WMO of interest to ISTI. The inclusion of at least some historic climate data amongst the set of products which countries agree to freely exchange has been a long-standing goal of ours. The key decisions on this will be made at the full WMO Congress, which will be held next year, but progress to date (including through the recent WMO Executive Council meeting) is encouraging. There are also moves to include the month’s daily data in monthly CLIMAT messages, which are the principal means of exchanging current climate data through the WMO system but currently only contain monthly data. This will be very useful for the ongoing updating of data sets, as it will make daily data available which can be assumed to be for a full 24-hour day and is likely to have received at least some quality control (neither of which is necessarily true for the real-time synoptic reports which are the primary current source of recent daily and sub-daily data). Considerable technical work remains to be done, though, to implement this, even once it is formally endorsed.

Data rescue and climate database systems continue to be a high priority of the Commission, with several initiatives outlined at the meeting. Among them are proposals for an international data rescue portal, which (among other things) would potentially facilitate crowd-sourced digitisation. It is, however, an indication of how much work still remains to be done in many parts of the world that, according to results of a survey reported at the meeting, 25% of responding countries still stored their country’s climate data in spreadsheets or flat files, and 40% had a climate database system which was not fully functioning or not functioning at all.

The Commission also agreed to establish a new Task Team on Homogenisation. The full membership (and chairing) of this group are not yet clear but I will almost certainly be part of it. This team will be working closely with ISTI, but will also have a major focus on supporting the implementation of homogenised data sets which contribute to operational data products nationally and internationally.

Also of interest to ISTI is a new WMO initiative to formally recognise “centennial stations”, which, as the name implies, are stations which have existed with few or no changes for 100 years or more. Countries are to be asked to identify such stations, whose data will clearly be of considerable value to ISTI, if not already part of our databank. Free access to data and relevant metadata are among the recommendations for centennial stations.

And one advantage of holding an international meeting during the World Cup: it provides an instant conversation-starter with delegates of almost any country. (Perhaps fortunately for the Brazilian delegation, the meeting finished just before the semi-finals).

(Update 5 August: the resolution which came out of the WMO Executive Council meeting is available at