Friday, August 29, 2014

ccc-gistemp and ISTI

This is a guest post by David Jones of the Climate Code Foundation. It is a mirror of their post at

ccc-gistemp is Climate Code Foundation‘s rewrite of the NASA GISS Surface Temperature Analysis GISTEMP. It produces exactly the same result, but is written in clear Python.

I’ve recently modified ccc-gistemp so that it can use the dataset recently released by the International Surface Temperature Initiative. Normally ccc-gistemp uses GHCN-M, but the ISTI dataset is much larger. Since ISTI publish the Stage 3 dataset in the same format as GHCN-M v3 the required changes were relatively minor, and Climate Code Foundation appreciates the fact that ISTI is published in several formats, including GHCN-M v3.

The ISTI dataset is not quality controlled, so, after re-reading section 3.3 of Lawrimore et al 2011, I implemented an extremely simple quality control scheme, MADQC. In MADQC a data value is rejected if its distance from the median (for its station’s named month) exceeds 5 times the median absolute deviation (MAD, hence MADQC); any series with fewer than 20 values (for each named month) is rejected.

So far I’ve found MADQC to be reasonable at rejecting the grossest non climatic errors.

Let’s compare the ccc-gistemp analysis using the ISTI Stage 3 dataset versus using the GHCN-M QCU dataset. The analysis for each hemisphere:

For both hemispheres the agreement is generally good and certainly within the published error bounds.

Zooming in on the recent period:

Now we can see the agreement in the northern hemisphere is excellent. In the southern hemisphere agreement is very good. The trend is slightly higher for the ISTI dataset.

The additional data that ISTI has gathered is most welcome, and this analysis shows that the warming trend in both hemispheres was not due to choosing a particular set of stations for GHCN-M. The much more comprehensive station network of ISTI shows the same trends.


  1. David Jones, GHCN-M is homogenized, right? Or did you use the raw data?

    The ISTI dataset is not homogenized, right? Or did you get assess to a preliminary homogenized version?

    If GHCN is homogenize and the ISTI dataset is not, then it is quite interesting that the raw data shows a stronger trend than the homogenized data. In GHCNv3 homogenization makes the trend a little stronger. Here it seems to be the opposite. Would be fun, then the climate "sceptics" will soon be fans of homogenization.

    1. QCU, not QCA. So this is Quality Controlled, but not "adjusted". So the GHCN-M data I use here is not homogenized. Similar it's the ISTI Stage 3 data, so again it is not homogenized. Does ISTI have plans to homogenize the data? I wasn't aware of any plans to do so.

      Sorry for not making this clearer in the post. But I am comparing two unhomogenized datasets.

    2. I should point out that ISTI is not entirely raw, as some of the source datasets are homogenized prior to incorporation in the databank (eg, Canada, HISTALP). But why am I telling you this, you're a co-author on Rennie et al 2014!

    3. David,

      there are most definitely plans to homogenize the databank. Not just one way but as many ways as possible. The databank is simply the starting point for the real science, augmented by the benchmarks. We would love to see 20 groups try to homogenize the databank and submit their algorithms to the benchmarking exercise. All our work on the benchmarks and e.g. in the SAMSI/IMAGe workshop is towards that end. We need more efforts and not fewer. And those efforts should work all off the databank and all submitting to the benchmarking so we can properly assess the resulting estimates. Do you fancy creating the CCF-homogenization-algorithm? :-)

    4. CCF is in the game of making science easier to understand. So far our take on homogenisation is that it makes so little difference to hemispheric trends that for understanding global warming, it's just not worth it.

      As for clarifying other people's homogenization algorithms; Daniel Rothenberg has already had a go at Menne and Wiliams algorithm for GHCN-M v3.

    5. If we are solely interested in global or hemispheric means then it is likely that homogenisation won't nudge the value terribly far. Its certainly not going to change the trend sign or half / double the multi-decadal warming rate globally. But I don't know too many folks who reside globally :-).

      What matters to people is how climate is changing regionally and locally and there homogenisation really starts to matter a lot. To help people make informed decisions its key we provide the best possible information, with uncertainties rigorously quantified, all the way from the individual station datapoint up to the multi-decadal global mean.

  2. Victor,

    I believe Dave used the homogenized GHCN-M here. Regardless, we found that the result also occurs in the raw data. See

  3. Yes, quite a difference between GHCN raw and ISTI raw. Then we should really compare like with like and my above comparison would be too simple. Still intriguing, looking forward to seeing the first homogenized versions of the ISTI database and trying to understand any differences.