Wednesday, May 25, 2016

Re-examining changes in Diurnal Temperature Ranges

A recently published pair of papers in JGR ($, sorry) reassessing changes in observed Diurnal Temperature Range changes has been recently highlighted in EOS and Nature Climate Change.

The analyses have been extremely long in the making. They started out back in 2010 as 'hobby' papers and never got explicit funding so trundled along very very slowly indeed. The release of the ISTI databank provided an opportunity to create a new estimate of DTR changes and compare it to several pre-existing estimates.

The first paper details the construction of the new dataset of DTR changes. This takes the version 1 release of the ISTI databank and applies the pairwise homogenisation algorithm (PHA) used by NOAA NCEI to these holdings. The paper deals with the homogenisation processing, analyses the resulting dataset estimates and discusses aspects of the underlying metrology (not a typo). Below are the gridded trends over 1951-2012 and the global timeseries. The 'raw' data is the basic data held in the databank. Directly adjusted is where the DTR series were presented to the PHA algorithm. Indirectly adjusted is where, instead, the adjustments to Tmin and Tmax are used.

We found that more breaks are returned for DTR than is the case for Tmax or Tmin, for which more breaks are returned again than Tmean. This has potential implications for future homogenisation strategies in that searching for breaks in Tmean appears sub-optimal. Potential reasons for this were detailed in a prior ISTI blogpost and are further elucidated upon in the paper itself.

The second paper takes the new analysis and compares it to several pre-existing analysis and then attempts to reformulate the findings on DTR from the IPCC Fifth Assessment Report (which assessed only medium confidence). The new analyses provide considerable confidence in a finding that DTR has decreased globally since the mid-twentieth Century, with most of that decrease occuring prior to 1980. Data are too sparse and uncertain to make meaningful conclusions about DTR changes prior to the mid-twentieth Century, at least globally. The compared datasets show very distinct coverage and somewhat divergent trends since the mid-twentieth Century:

Much of the divergence between estimates results from the disparate approaches taken to accounting for incomplete sampling by the underlying data through interolating (or not) into data sparse regions. Using the native coverage (top) or the estimates restricted to common coverage (bottom) greatly alters the perceived degree of agreement between the independently produced products from various groups:
The conclusion of the second paper was as follows:

The driving rationale behind this work was the lack of explicit progress in the literature in assessing DTR changes between the fourth and fifth assessment reports of the IPCC. Based upon the findings herein, where a new assessment to be performed by IPCC of the observational DTR record at this time the text might read as follows (use of IPCC carefully calibrated uncertainty language and italicization [Mastrandrea et al., 2010] is intended).

It is virtually certain that globally averaged DTR has significantly decreased since 1950. This reduction in DTR is robust to both choice of data set and to reasonable variations in station selection and gridding methodology. However, differences between available estimates mean that there is only medium confidence in the magnitude of the DTR reductions. It is likely that most of the global-mean decrease occurred between 1960 and 1980 and that since then globally averaged DTR has exhibited little change. Because of current data sparsity in the digitized records, there is low confidence in trends and multidecadal variability in DTR prior to the middle twentieth century. It is likely that considerable pre-1950 data exist that could be shared and/or rescued and used in future analyses. All assessed estimates of global DTR changes are substantially smaller than the concurrently observed increases in mean and maximum and minimum temperatures (high confidence, virtually certain).

The datasets and code used are available via

Thursday, October 15, 2015

Global Land Surface Databank: Version 1.1.0 Release

In June 2014, the first version of the global databank was released (Rennie et al., 2014), which included data from nearly 50 different sources and an algorithm to resolve duplicate stations and piece together complete temperature time series. Since then, there have been monthly updates, appending new data to existing stations. Thanks to user feedback, along with additional analysis described below, minor changes were introduced and implemented to the merge program to ensure the most accurate data were incorporated in the final product. This, along with updates to current sources required a small change to the versioning system. The remainder of this post will highlight the changes implemented in the global land surface databank, version 1.1.0. More information about the structure of the databank, including sources, formats, and merge algorithm, can be found on the databank website (

Updates to Stage 1 and Stage 2 Data 
The databank design includes six data Stages, starting from the original observation to the final quality controlled and bias corrected products. For the purposes of this update, only three stages were modified: digitized data (Stage One), data converted to a common format (Stage Two), and the merged dataset (Stage Three).

The highest priority source comes from the Global Historical Climatology Network – Daily (GHCN-D) dataset (Menne et al. 2012). In June 2015, GHCN-D underwent a large update, which included a new average temperature element (TAVG), along with the addition of 1,400 stations that are a part of the World Meteorological Organization’s (WMO) Regional Basic Climatology Network (RBCN). Because these stations are important for real time updates, it was necessary to include this new version in the latest merge.

Further assessment was also done on one of our sources known as “russsource.” This source contained over 36,000 stations reporting maximum and minimum temperature. While the original format was consistent across all stations, it was discovered that this source included 27 individual sources. It was decided to split these sources up and place them individually in the merge following the source hierarchy defined by the databank working group. Because of some duplication with sources used in GHCN-D, only 20 of the 27 sources were included. In addition, station ID’s were brought into the Stage Two data, so that the merge’s ID test could be implemented. The same was done for the source known as “ghcnsource.” 

Other than the above, no additional sources were added to the source hierarchy. One source however was removed (crutem4), because it was determined that the use of these stations as a last resort was causing stations to be unique because of the data changes through bias corrections. Candidate stations from crutem4 were matched with their respective target stations through metadata tests, but were chosen as unique from the data tests, because of these corrections. In order to avoid excessive station duplication, this source was removed.

Changes to Merge Algorithm
The merge algorithm, as described by Rennie et al. 2014, underwent no code changes. However, a couple of thresholds were modified in order to maximize the amount of data the final recommended product would have. The thresholds are defined in a configuration file that is required for the program to run successfully.

The first step of the merge algorithm takes into account the metadata between a target and candidate station, including the stations latitude, longitude, elevation and name. A quasi-probabilistic comparison is made and the result is a metadata metric between 0 and 1. In version 1.0.0, this metric needed to pass a threshold of 0.50 in order to be considered for merging. Analysis showed that too many stations were being pulled through and forcing merges between stations that shouldn’t have. As a result, a stricter threshold of 0.75 was applied, in order to avoid this issue.

In addition, once a candidate station is chosen to merge with a candidate station, it needs to fill in a gap of at least 60 months (5 years) in order to be added to the target station. It was determined that this gap was too large, and target stations with short gaps in its data were not being filled in by qualifying candidate stations. This gap threshold has been reduced to 12 months as a result.

Similar to version 1.0.0, all decisions made were tested against an independent dataset generated from hourly data for US stations available in the Integrated Surface Dataset (Smith et al. 2011). Results only show a small change between the two versions

Version 1.1.0 of the recommended merge contains 35,932 stations (Figure 1), nearly 4,000 stations more than v1.0.0 (32,142). Figure 2 depicts that the addition of stations reflect the most recent period, as there is relatively a 10% increase in the number of stations since 1950. It should be noted that there is a drop in coverage prior to 1950 with the new version. However it is the author’s opinion that this was reflected by removing crutem4 as one of the sources. Including this source had made candidate stations unique, due to differences in its data as a result of the data providers bias corrections. While the number of stations is lower during this time period for v1.1.0, it should be noted that the number of gridboxes used in analysis (Figure 3) was either equal, or slightly higher than v1.0.0.

Stage Three normally includes a merge recommended and endorsed by ISTI, along with variants showing the structural uncertainty of the algorithm. Due to time constraints, these variants are not available, however will be provided at a later date.

Figure 1: Location of all stations in the recommended Stage Three component of the databank. The color corresponds to the number of years of data available for each station. Stations with longer periods of record mask stations with shorter periods of record when they are in approximate identical locations.

Figure 2: Station count of recommended merge v1.1.0 by year from 1850 to 2014, compared to version 1.0.0, along with GHCN-M version 3.

Figure 3: Percentage of global coverage with respect to 5 degree gridboxes for the recommended merger v1.1.0 by year from 1850-2014, comparted to version 1.0.0, along with GHCN-M version 3.

Saturday, June 6, 2015

Promotional flyers now available in German and Spanish

These have sat on my to-do-list for far too long but I have now finally found time to place the German and Spanish translations of the flyers that were taken to COP in Lima on the website. These can be found at My thanks to Enric Aguilar,  Stefan Bronnimann, Renate Auchmann and Victor Venema for the significant efforts to undertake these translations. Also a big thanks is due to NCEI graphics team for their efforts to re-render the original flyers in multiple languages.

The promotional materials are freely available and encouraged for re-use in any forum that may help raise interest in and knowledge of ISTI and its aims. Please feel free to take copies anywhere and everywhere that is relevant.

Thursday, June 4, 2015

The Karl et al. Science paper and ISTI

Note: this post is partly personal opinion.

I suspect when this is being posted at unembargo time there will be a whole slew of stories running on the news media and blogs about the Karl et al. paper in Science (I shall add a link to the actual paper if I remember later). But given the use of the ISTI databank in the analysis - its first high profile use in anger and a testament to all those years of hard work by very many colleagues (principally Jared Rennie) - some may come towards this little quiet corner of the internet. So here are some quick thoughts.

Karl et al. find greater recent period warming using a new set of land and sea surface temperature records than their operational versions used in NCEI's monitoring products to date. They conclude that there is no statistical evidence for a slowdown in the rate of warming in the new estimate calling into apparent question the much discussed 'hiatus'.

Firstly, to be clear, most of the change in trend documented in Karl et al. arises not from the land (the focus of ISTI) but rather from the sea surface temperature dataset changes. These changes relate to their now calculating ship bias adjustments throughout the record, and accounting for the transition from predominantly ships to predominantly buoys since the 1980s. There is no doubt that buoys read colder than ships (attested to in multiple published analyses) - so in not previously accounting for this the prior NCDC analysis had a marked propensity to underestimate sea surface temperature changes in the most recent period. There are other changes in the sea surface temperature dataset documented in Huang et al and Liu et al. These are secondary in terms of recent trends but still important for certain applications. For example, ERSSTv4 likely captures far better ENSO variations prior to 1920 or so. This, however, is a land surface air temperatures blog so I shall wax lyrical no further on the matter of SSTs. I can try to answer questions on ERSSTv4 in the comments (I was a co-author on the ERSST analyses) if you have any burning questions.

So, onto land temperatures. Karl et al. apply the pre-existing pairwise homogenization algorithm used in GHCNv3 to the databank version 1.0.1 release. Effectively this is going from considering these:

to considering these:

The effect of going from the 7,280 stations in GHCNMv3 to applying the same algorithm to the databank (although not all 32,128 stations as many were too short or isolated or incomplete - Karl et al. mentions 'double' so somewhere around 15,000 were likely used) is very much smaller than the effect of the sea surface temperature changes despite the step change in station count and coverage. The most recent period trends in Karl et al. over land exhibit a little more warming (c.10%) than GHCNv3 does, but its not remotely statistically siginificant. It'll be interesting to look, down the line, at what proportion of that change arises from improved coverage and what proportion to changes in areas of common sampling and to consider the effects on common stations and a slew of other analyses. Presumably this will be part of a broader analysis under GHCNv4 which will be built off the databank release, again using PHA. There may be additional innovations, in part arising from the SAMSI/IMAGe/ISTI workshop held in Boulder last summer. 

There are two additional questions that arise:

1. Does this analysis obviate the need for ISTI?

Absolutely not.

Without ISTI the land side of Karl et al would not have been possible for starters. But more generally this is but one estimate and we most definitely need multiple estimates. We are also yet to run the PHA algorithm and others through the benchmarks through which additional insights and improvements are expected to accrue. We also know there remain lots and lots of data out there to rescue and incorporate into the holdings and use to get still better estimates of the global, regional and local changes. So, much work to be done and we have only just started to scratch the surface of what is possible.

2. Does it call into question the slew of papers of the recent hiatus / pause / slowdown?

Not really.

The NCDC estimate (and GISS which uses the same marine and land basis estimates) was already at the low-end of the family of available estimates of global mean behaviour and this simply puts them back within or just above these estimates for their trends over 1998 to 2012 / 2014. The slowdown is also less marked in all of the datasets now in part because of the additional two and a bit years since the AR5 reported periods in which we appear to be flipping to a positive IPO (this will become clearer with time) which will cause enhanced short-term surface warming.

But, in part this is a question of which hypothesis to test. Karl et al are testing whether there has been a detectable change in the observed trend behaviour. The answer is no, and pretty much was anyway according to a number of prior analyses. The modern period adjustments and innovations in Karl et al. simply strengthen that conclusion.

Arguably the more interesting hypothesis to test is whether the observations are consistent with the family of climate model projections. Here the Karl et al adjustments take the NCDC dataset from inconsistent (3 sigma) to suspicious (2 sigma) (here I am adopting metrology Guide to Uncertainties in Measurements language for clarity - in that context Karl et al. analysis takes us from k=3 to k=2, k=1 (within 1 sigma) would be deemed consistent).

Furthermore, the questions of mechanistic understanding of decadal variability that all these studies have focussed upon are societally relevant and will improve our understanding of the climate system. Not only that but the insights will be used to improve climate models and therefore future predictions and projections. So, the existing literature on the topic is undoubtedly highly valuable. Doubtless there will be those saying they aren't / weren't.

Concluding remarks

To conclude, worryingly not for the first time (think tropospheric temperaures in late 1990s / early 2000s) we find that potentially some substantial portion of a model-observation discrepancy that has caused a degree of controversy is down to unresolved observational issues. There is still an undue propensity for scientists and public alike to take the observations as a 'given'. As Karl et al. attests, even in the modern era we have imperfect measurements.

Which leads me to a final proposition for a more scientifically sane future ...

This whole train of events does rather speak to the fact that we can and should observe in a more sane, sensible and rational way in the future. There is no need to bequeath onto researchers in 50 years time a similar mess. If we instigate and maintain refernce quality networks that are stable SI traceable measures with comprehensive uncertainty chains such as USCRN, GRUAN etc. but for all domains for decades to come we can have the next generation of scientists focus on analyzing what happened and not, depressingly, trying instead to inevitably somewhat ambiguously ascertain what happened.

Saturday, February 28, 2015

Promotional flyers now available in French

International volunteers are helping to translate the promotional materials recently distributed at the COP meeting in Lima into additional languages. These will be made available through as they become available. Please distribute and use to promote the Initiative's aims and objectives at relevant venues and meetings.

With thanks to Lucie Vincent of Environment Canada and the graphics team at NOAA's National Climatic Data Center versions in French are now available.

Tuesday, January 20, 2015

Because the POSTman always delivers ...

We recently had a full teleconference meeting of participants. If you are prone to insomnia the full minutes are available at this link.

The major news is that, after some discussions on the appropriate name for the group ISTI does, indeed, have a new group ... the Parallel Observations Science Team (or POST) led by Victor Venema and Renate Auchmann.

You may recall a number of posts on this subject over at Victor's place. We shall work with colleagues to help further this effort. By being part of the formal ISTI family we will ensure that benefits regarding data holdings, benchmarking, and lessons learnt from this effort are more broadly shared. We always look for win-wins!

We are still looking at populating the parallel measurements database so if you know of any coincident measurements using distinct techniques or looking at spatial variability at the local scale (or both) then please do get in contact. Victor and Renate are also still populating this group (terms of reference here) so if parallel measurements are of interest and you feel you could contribute drop them a line.

More details on this effort can be found at

Thursday, January 1, 2015

Survey on national homogenised temperature data sets

I've recently run a survey on national homogenised temperature data sets. Whilst this was not an exhaustive survey (as indicated by the number of responses), it is an indication of what's out there and what resources various countries are putting into this work.

Survey reports were received from 18 countries (CHN, CAN, ISR, IRL, SUI, SLO, NOR, HUN, NED, ROM, GBR, AUT, SRB, ESP, CZE, SWE, UKR, AUS) and 1 region (Catalonia). Summary results were as follows:

1. Number of staff involved in homogenisation (full-time equivalent)

Less than 1                                          2 countries
1-2                                                       9
2-4                                                       5
4 or more                                             3

(global and continental data sets are excluded from this - for example, the UK have several people working on the HadCRUT data sets, and the Netherlands on ECA&D and associated projects)

2. Existence of a national homogenised data set

Yes                                                                                         16
Yes but not yet released                                                         1
No national set but a station/regional set                              1
No                                                                                          1

3. Time resolution of data set

Daily                                                                                      8
Monthly                                                                                 7
Mix depending on element                                                    1
Monthly for early data, daily for later                                   1

4. Time resolution of adjustment

Results from this are a little unclear – several responses indicated use of the Vincent methodology, which interpolates adjustments based on monthly values to daily timescales.

Daily                                                                                      4
Monthly                                                                                 11
Monthly for detection, daily for adjustment                          2

5. Elements included

Maximum, minimum and mean temperature                        8
Maximum and minimum temperature                                   5
Mean temperature only                                                          4

(note that ‘maximum and minimum temperature’ implies mean temperature is not homogenised independently – in most cases it can still be calculated based on max/min)

6. Frequency of updating/reassessing homogeneity

Not updated                                                                      6 (in 2 cases, the first data set has only just
                                                                                              been completed)
Appended with unadjusted data only                               2
Irregularly                                                                         1
Annually or near-annually                                                4
Intervals longer than 2 years                                            4 (ranging from every 3 to every 10 years)

Thursday, December 11, 2014

Why we need max and min temperatures for all stations

I'm doing an analysis of Diurnal Temperature Range (DTR; more on that when published) but as part of this I just played with a little toy box model and the result is sufficiently of general interest to highlight here and maybe get some feedback.

So, for most stations in the databank we have data for maximum (Tx) and minimum (Tn) that we then average to get Tm. Now, that is not the only transform possible - there is also DTR which is Tx-Tn. Although that is not part of the databank archive its a trivial transform. In looking at results running NCDC's pairwise algorithm distinct differences in breakpoint detection efficacy and adjustment distribution arise, which have caused great author team angst.

This morning I constructed a simple toy box where I just played what if. More precisely what if I allowed seeded breaks in Tx and Tn in the bound -5 to 5 and considered the break size effects in Tx, Tn, Tm and DTR:
The top two panels are hopefully pretty self explanatory. Tm and DTR effects are orthogonal which makes sense. In the lowest panel (note colours chosen from colorbrewer but please advise if issues for colour-blind folks):
red: Break largest in Tx
blue: Break largest in Tn
purple: break largest in DTR
green: break largest in Tm (yes, there is precisely no green)
Cases with breaks equal in size are no colour (infintesimally small lines along diagonal and vertices at Tx and Tn =0)

So …

if we just randomly seeded Tx and Tn breaks in an entirely uncorrelated manner into the series then we would get 50% of breaks largest in DTR and 25% each in Tx and Tn. DTR should be broader in its overall distribution and Tm narrower with Tx and Tn intermediate.

if we put in correlated Tx and Tn breaks such that they were always same sign (but not magnitude) then they would always be largest in either Tx or Tn (or equal with Tm when Tx=Tn)

If we put in anti-correlated breaks then they would always be largest in DTR.

Perhaps most importantly, as alluded to above, breaks will only be equal largest for Tm in a very special set of cases where Tx break = Tn break. Breaks, on average will be smallest in Tm. If breakpoint detection and adjustment is a signal to noise problem its not sensible to look where the signal is smallest. This has potentially serious implications for our ability to detect and adjust for breakpoints if we limit ourselves to Tm and is why we should try to rescue Tx and Tn data for the large amount of early data for which we only have Tm in the archives.

Maybe in future we can consider this as an explicitly joint estimation problem of finding breaks in the two primary elements and two derived elements and then constructing physically consistent adjustment estimates from the element-wise CDFs. Okay, I'm losing you now I know so I'll shut up ... for now ...


Bonus version showing how much more frequently DTR is larger than Tm:


Tuesday, December 9, 2014

What has changed since the version 1 release of the Databank?

It has been nearly six months since we have released the first version of the databank. While this was a big achievement for the International Surface Temperature Initiative, our work is not done. We have taken on many different tasks since the release, and a brief description is below:

Monthly Update System
As described in this post, we have implemented a monthly update system appending near real time (NRT) data into the databank. On the 5th of each month 4 sources (ghcnd, climat-ncdc, climat-uk, mcdw-unpublished) update their Stage 1 data, and on the 11th, their common formatted data (Stage 2) are then updated. In addition, an algorithm is applied appending new data to the recommended merge, and that is updated on the 11th as well.

Bug Fixes
Users have submitted some minor issues with version 1. Some stations in Serbia were given a country code of "RB" when they should have been given "RI." These have been addressed, and a new version of the databank (v1.0.1) was released.

There have been concerns about how the station name is displayed. Non-ASCII characters pose problems with some text interpreters. A module has been created in the Stage 1 to Stage 2 conversion scripts where these characters are either changed or removed to avoid this problem in the future.

Of course issues could still exist, if you find any please let us know! As an open and transparent initiative, we encourage constructive criticism and will apply any reasonable suggestions to future versions.

New Sources
We have acquired new sources that will be added as Stage 1 and Stage 2 data soon, including
  • 300 UK Stations from the Met Office
  • German data released by DWD
  • EPA's Oregon Crest to Coast Dataset
  • LCA&D: Latin American Climate Assessment and Dataset
  • Daily Chinese Data
  • NCAR Surface Libraries
  • Stations from Meteomet project
  • Libya Stations sent by their NMS
  • C3/EURO4M Stations
  • Additional Digitized Stations from the University of Giessen
  • Homogenized Iranian Data
It is not too late to submit new data. If you have a lead on sources please let us know at We will freeze the sources again on February 28th, 2015, in order to work on the next version of the merge.

Friday, December 5, 2014

Discovering NCDC's hard copy holdings

Update Dec 11th: permanent link with some browser issues resolved at

NOAA's National Climatic Data Center have undertaken an inventory of their substantial basement holdings of hard copy data. These include a rich mix of data types on varied media including paper, fiche and microfilm.

One row of several dozen in the NCDC archive of hard copy paper holdings from around the world

Microfilm holdings arising from Europe over the second world war

Some, but far from all, of this data has been imaged and / or digitized. NCDC have now released the catalogue online and made it searchable. The catalogue interface can be found at (click on search records). The degree to which a given holding has been catalogued varies but this is a good place to at least begin to ascertain what holdings there are there and what their status is. For example searching on American Samoa as country provides a list of holdings most of which are hard copy only.
Example search results for American Samoa
For those interested in aspects of data rescue, this is likely to be a useful tool to ascertain whether NCDC hold any relevant records. By reasonable estimates at least as much data exists in hard copy / imaged format as has been digitised for the pre-1950 period. That is a lot of unknown knowns and could provide such rich information to improve understanding ...

Wednesday, November 26, 2014

A set of flyers for promoting the initiative's aims and outcomes

We have produced a set of one-sider flyers to promote the initiative and its aims and to try to engender additional inputs, collaborations and contributions. These will be taken by Kate Willett to the forthcoming COP meeting in Peru next month.

We strongly encourage use of these flyers at appropriate venues to support the further advancement of our work.

The set of flyers can be found at There are flyers on:
[links are to pdf versions]

Our more eagle eyed readers would have noted above a new strand to our work. I am delighted to say that we have, following the most recent steering committee call, formally recognized the efforts led by Victor Venema and Renate Auchmann to populate and exploit a database of parallel measurements by instigating a new expert team under the databank working group. We shall do all we can to support this important effort and in the first instanace we encourage readers to help us in the identification and collection of such holdings.

A stub page is available at which we shall populate over the coming months. In the meantime more information on this effort can be found at

Wednesday, November 5, 2014

Release of a daily benchmark dataset - version 1

The ISTI benchmark working group includes a PhD student looking at benchmarking daily temperature homogenisation algorithms. This largely follows the concepts laid out in the benchmark working group's publication. Significant progress has been made in this field. This post announces the release of a small daily benchmark dataset focusing on four regions in North America. These regions can be seen in Figure 1. 

Figure 1 Station locations of the four benchmark regions. Blue stations are in all worlds. Red stations only appear in worlds 2 and 3.

These benchmarks have similar aims to the global benchmarks that are currently being produced by the ISTI working group, namely to:

  1. Assess the performance of current homogenisation algorithms and provide feedback to allow for their improvement 
  2. Assess how realistic the created benchmarks are, to allow for improvements in future iterations 
  3. Quantify the uncertainty that is present in data due to inhomogeneities both before and after homogenisation algorithms have been run on them

A perfect algorithm would return the inhomogeneous data to their clean form – correctly identifying the size and location of the inhomogeneities and adjusting the series accordingly. The inhomogeneities that have been added will not be made known to the testers until the completion of the assessment cycle – mid 2015. This is to ensure that the study is as fair as possible with no testers having prior knowledge of the added inhomogeneities.

The data are formed into three worlds, each consisting of the four regions shown in Figure 1. World 1 is the smallest and contains only those stations shown in blue in Figure 1, Worlds 2 and 3 are the same size as each other and contain all the stations shown.

Homogenisers are requested to prioritise running their algorithms on a single region across worlds instead of on all regions in a single world. This will hopefully maximise the usefulness of this study in assessing the strengths and weaknesses of the process. The order of prioritisation for the regions is Wyoming, South East, North East and finally the South West.

This study will be more effective the more participants it has and if you are interested in participating please contact Rachel Warren (rw307 AT The results will form part of a PhD thesis and therefore it is requested that they are returned no later than Friday 12th December 2014. However, interested parties who are unable to meet this deadline are also encouraged to contact Rachel.

There will be a further smaller release in the next week that is just focussed on Wyoming and will explore climate characteristics of data instead of just focusing on inhomogeneity characteristics.

Monday, October 6, 2014

A framework for benchmarking of homogenisation algorithm performance on the global scale - Paper now published

The ISTI benchmarking working group have just had their first benchmarking paper accepted at Geoscientific Instrumentation, Methods and Data Systems:

Willett, K., Williams, C., Jolliffe, I. T., Lund, R., Alexander, L. V., Brönnimann, S., Vincent, L. A., Easterbrook, S., Venema, V. K. C., Berry, D., Warren, R. E., Lopardo, G., Auchmann, R., Aguilar, E., Menne, M. J., Gallagher, C., Hausfather, Z., Thorarinsdottir, T., and Thorne, P. W.: A framework for benchmarking of homogenisation algorithm performance on the global scale, Geosci. Instrum. Method. Data Syst., 3, 187-200, doi:10.5194/gi-3-187-2014, 2014.

Benchmarking, in this context, is the assessment of homogenisation algorithm performance against a set of realistic synthetic worlds of station data where the locations and size/shape of inhomogeneities are known a priori. Crucially, these inhomogeneities are not known to those performing the homogenisation, only those performing the assessment. Assessment of both the ability of algorithms to find changepoints and accurately return the synthetic data to its clean form (prior to addition of inhomogeneity) has three main purposes:

      1) quantification of uncertainty remaining in the data due to inhomogeneity
      2) inter-comparison of climate data products in terms of fitness for a specified purpose
      3) providing a tool for further improvement in homogenisation algorithms

Here we describe what we believe would be a good approach to a comprehensive homogenisation algorithm benchmarking system. Thfis includes an overarching cycle of: benchmark development; release of formal benchmarks; assessment of homogenised benchmarks and an overview of where we can improve for next time around (Figure 1).

Figure 1 Overview the ISTI comprehensive benchmarking system for assessing performance of homogenisation algorithms. (Fig. 3 of Willett et al., 2014)

There are four components to creating this benchmarking system. 

Creation of realistic clean synthetic station data

Firstly, we must be able to synthetically recreate the 30000+ ISTI stations such that they have the correct variability, auto-correlation and interstation cross-correlations as the real data but are free from systematic error. In other words, they must contain a realistic seasonal cycle and features of natural variability (e.g., ENSO, volcanic eruptions etc.). There must be a realistic persistence month-to-month in each station and geographically across nearby stations. 

Creation of realistic error models to add to the clean station data

The added inhomogeneities should cover all known types of inhomogeneity in terms of their frequency, magnitude and seasonal behaviour. For example, inhomogeneities could be any or a combination of the following:

     -  geographically or temporally clustered due to events which affect entire networks or regions (e.g. change in observation time);
     -  close to end points of time series;
     -  gradual or sudden;
     -  variance-altering;
     -  combined with the presence of a long-term background trend;
     - small or large;

     - frequent;
     - seasonally or diurnally varying.

Design of an assessment system

Assessment of the homogenised benchmarks should be designed with the three purposes of benchmarking in mind. Both the ability to correctly locate changepoints and to adjust the data back to its homogeneous state are important. It can be split into four different levels:

     - Level 1: The ability of the algorithm to restore an inhomogeneous world to its clean world state in terms of climatology, variance and trends.

     - Level 2: The ability of the algorithm to accurately locate changepoints and detect their size/shape.

     - Level 3: The strengths and weaknesses of an algorithm against specific types of inhomogeneity and observing system issues.

     - Level 4: A comparison of the benchmarks with the real world in terms of detected inhomogeneity both to measure algorithm performance in the real world and to enable future improvement to the benchmarks.

The benchmark cycle

This should all take place within a well laid out framework to encourage people to take part and make the results as useful as possible. Timing is important. Too long a cycle will mean that the benchmarks become outdated. Too short a cycle will reduce the number of groups able to participate.

Producing the clean synthetic station data on the global scale is a complicated task that has now taken several years but we are close to completion of a version 1. We have collected together a list of known regionwide inhomogeneities and a comprehensive understanding of the many many different types of inhomogeneities that can affect station data. We have also considered a number of assessment options and decided to focus on levels 1 and 2 for assessment within the benchmark cycle. Our benchmarking working group is aiming for release of the first benchmarks by January 2015.

Friday, September 12, 2014

The Databank Near Real Time Update System

Since the official release back in June, we have worked to keep the databank updated with the most recent data. Each month we will post new data from sources that update in near-real-time (NRT), along with an updated version of the recommended merge with the latest data appended. Stage 1 data (digitized in its original form) will be updated no later than the 5th of each month, and then Stage 2 (common formatted data) and Stage 3 (merged record) data will be updated no later than the 11th of the month.

So what data gets updated in our NRT system? We have determined four sources that have updated data within the first few days of the month. They are the CLIMAT streams from NCDC as well as the UK, the unpublished form of the monthly climatic data for the world (MCDW) and finally GHCN-D. Similar to the merge program, a hierarchy is placed determining which source its data appends to if there are conflicts. The hierarchy is here:

4) MCDW-Unpublished

An overview of the system is shown here in this flow diagram (Click on image to enlarge):

The algorithm to append data looks for station matches through the same metadata tests as described in the merge program. These include geographic distance, height distance, and station name similarity using the Jaccard Index. If the metadata metric is good, then an ID test is used to determine station match. Because the four input sources have either a GHCN-D or WMO ID, the matching is much easier here than in the merge program. Once a station match is found, new data from the past few months are appended. Throughout this process, no new stations are added.

We have had two monthly updates so far. As always the latest recommended merge data can be found on our ftp page here, along with older data placed in the archive here. Note that we are only updating the recommended merge, and not the variants. In addition, the merge metadata is not updated, because no new merge has been applied yet. We plan to have another merge out sometime in early 2015.

Friday, August 29, 2014

ccc-gistemp and ISTI

This is a guest post by David Jones of the Climate Code Foundation. It is a mirror of their post at

ccc-gistemp is Climate Code Foundation‘s rewrite of the NASA GISS Surface Temperature Analysis GISTEMP. It produces exactly the same result, but is written in clear Python.

I’ve recently modified ccc-gistemp so that it can use the dataset recently released by the International Surface Temperature Initiative. Normally ccc-gistemp uses GHCN-M, but the ISTI dataset is much larger. Since ISTI publish the Stage 3 dataset in the same format as GHCN-M v3 the required changes were relatively minor, and Climate Code Foundation appreciates the fact that ISTI is published in several formats, including GHCN-M v3.

The ISTI dataset is not quality controlled, so, after re-reading section 3.3 of Lawrimore et al 2011, I implemented an extremely simple quality control scheme, MADQC. In MADQC a data value is rejected if its distance from the median (for its station’s named month) exceeds 5 times the median absolute deviation (MAD, hence MADQC); any series with fewer than 20 values (for each named month) is rejected.

So far I’ve found MADQC to be reasonable at rejecting the grossest non climatic errors.

Let’s compare the ccc-gistemp analysis using the ISTI Stage 3 dataset versus using the GHCN-M QCU dataset. The analysis for each hemisphere:

For both hemispheres the agreement is generally good and certainly within the published error bounds.

Zooming in on the recent period:

Now we can see the agreement in the northern hemisphere is excellent. In the southern hemisphere agreement is very good. The trend is slightly higher for the ISTI dataset.

The additional data that ISTI has gathered is most welcome, and this analysis shows that the warming trend in both hemispheres was not due to choosing a particular set of stations for GHCN-M. The much more comprehensive station network of ISTI shows the same trends.