Surface temperatures

Thursday, January 1, 2015

Survey on national homogenised temperature data sets

I've recently run a survey on national homogenised temperature data sets. Whilst this was not an exhaustive survey (as indicated by the number of responses), it is an indication of what's out there and what resources various countries are putting into this work.

Survey reports were received from 18 countries (CHN, CAN, ISR, IRL, SUI, SLO, NOR, HUN, NED, ROM, GBR, AUT, SRB, ESP, CZE, SWE, UKR, AUS) and 1 region (Catalonia). Summary results were as follows:

1. Number of staff involved in homogenisation (full-time equivalent)

Less than 1 2 countries

1-2 9

2-4 5

4 or more 3

(global and continental data sets are excluded from this - for example, the UK have several people working on the HadCRUT data sets, and the Netherlands on ECA&D and associated projects)

2. Existence of a national homogenised data set

Yes 16

Yes but not yet released 1

No national set but a station/regional set 1

No 1

3. Time resolution of data set

Daily 8

Monthly 7

Mix depending on element 1

Monthly for early data, daily for later 1

4. Time resolution of adjustment

Results from this are a little unclear – several responses indicated use of the Vincent methodology, which interpolates adjustments based on monthly values to daily timescales.

Daily 4

Monthly 11

Monthly for detection, daily for adjustment 2

5. Elements included

Maximum, minimum and mean temperature 8

Maximum and minimum temperature 5

Mean temperature only 4

(note that ‘maximum and minimum temperature’ implies mean temperature is not homogenised independently – in most cases it can still be calculated based on max/min)

6. Frequency of updating/reassessing homogeneity

Not updated 6 (in 2 cases, the first data set has only just
been completed)

Appended with unadjusted data only 2

Irregularly 1

Annually or near-annually 4

Intervals longer than 2 years 4 (ranging from every 3 to every 10 years)

Thursday, December 11, 2014

Why we need max and min temperatures for all stations

I'm doing an analysis of Diurnal Temperature Range (DTR; more on that when published) but as part of this I just played with a little toy box model and the result is sufficiently of general interest to highlight here and maybe get some feedback.

So, for most stations in the databank we have data for maximum (Tx) and minimum (Tn) that we then average to get Tm. Now, that is not the only transform possible - there is also DTR which is Tx-Tn. Although that is not part of the databank archive its a trivial transform. In looking at results running NCDC's pairwise algorithm distinct differences in breakpoint detection efficacy and adjustment distribution arise, which have caused great author team angst.

This morning I constructed a simple toy box where I just played what if. More precisely what if I allowed seeded breaks in Tx and Tn in the bound -5 to 5 and considered the break size effects in Tx, Tn, Tm and DTR:

The top two panels are hopefully pretty self explanatory. Tm and DTR effects are orthogonal which makes sense. In the lowest panel (note colours chosen from colorbrewer but please advise if issues for colour-blind folks):
red: Break largest in Tx
blue: Break largest in Tn
purple: break largest in DTR
green: break largest in Tm (yes, there is precisely no green)
Cases with breaks equal in size are no colour (infintesimally small lines along diagonal and vertices at Tx and Tn =0)

So …

if we just randomly seeded Tx and Tn breaks in an entirely uncorrelated manner into the series then we would get 50% of breaks largest in DTR and 25% each in Tx and Tn. DTR should be broader in its overall distribution and Tm narrower with Tx and Tn intermediate.

if we put in correlated Tx and Tn breaks such that they were always same sign (but not magnitude) then they would always be largest in either Tx or Tn (or equal with Tm when Tx=Tn)

If we put in anti-correlated breaks then they would always be largest in DTR.

Perhaps most importantly, as alluded to above, breaks will only be equal largest for Tm in a very special set of cases where Tx break = Tn break. Breaks, on average will be smallest in Tm. If breakpoint detection and adjustment is a signal to noise problem its not sensible to look where the signal is smallest. This has potentially serious implications for our ability to detect and adjust for breakpoints if we limit ourselves to Tm and is why we should try to rescue Tx and Tn data for the large amount of early data for which we only have Tm in the archives.

Maybe in future we can consider this as an explicitly joint estimation problem of finding breaks in the two primary elements and two derived elements and then constructing physically consistent adjustment estimates from the element-wise CDFs. Okay, I'm losing you now I know so I'll shut up ... for now ...

Update:

Bonus version showing how much more frequently DTR is larger than Tm:

Tuesday, December 9, 2014

What has changed since the version 1 release of the Databank?

It has been nearly six months since we have released the first version of the databank. While this was a big achievement for the International Surface Temperature Initiative, our work is not done. We have taken on many different tasks since the release, and a brief description is below:

Monthly Update System
As described in this post, we have implemented a monthly update system appending near real time (NRT) data into the databank. On the 5th of each month 4 sources (ghcnd, climat-ncdc, climat-uk, mcdw-unpublished) update their Stage 1 data, and on the 11th, their common formatted data (Stage 2) are then updated. In addition, an algorithm is applied appending new data to the recommended merge, and that is updated on the 11th as well.

Bug Fixes
Users have submitted some minor issues with version 1. Some stations in Serbia were given a country code of "RB" when they should have been given "RI." These have been addressed, and a new version of the databank (v1.0.1) was released.

There have been concerns about how the station name is displayed. Non-ASCII characters pose problems with some text interpreters. A module has been created in the Stage 1 to Stage 2 conversion scripts where these characters are either changed or removed to avoid this problem in the future.

Of course issues could still exist, if you find any please let us know! As an open and transparent initiative, we encourage constructive criticism and will apply any reasonable suggestions to future versions.

New Sources
We have acquired new sources that will be added as Stage 1 and Stage 2 data soon, including

300 UK Stations from the Met Office
German data released by DWD
EPA's Oregon Crest to Coast Dataset
LCA&D: Latin American Climate Assessment and Dataset
Daily Chinese Data
NCAR Surface Libraries
Stations from Meteomet project
Libya Stations sent by their NMS
C3/EURO4M Stations
Additional Digitized Stations from the University of Giessen
Homogenized Iranian Data

It is not too late to submit new data. If you have a lead on sources please let us know at data.submission@surfacetemperatures.org. We will freeze the sources again on February 28th, 2015, in order to work on the next version of the merge.

Friday, December 5, 2014

Discovering NCDC's hard copy holdings

Update Dec 11th: permanent link with some browser issues resolved at http://www.ncdc.noaa.gov/webartis

NOAA's National Climatic Data Center have undertaken an inventory of their substantial basement holdings of hard copy data. These include a rich mix of data types on varied media including paper, fiche and microfilm.

One row of several dozen in the NCDC archive of hard copy paper holdings from around the world

Microfilm holdings arising from Europe over the second world war

Some, but far from all, of this data has been imaged and / or digitized. NCDC have now released the catalogue online and made it searchable. The catalogue interface can be found at https://www.ncdc.noaa.gov/cdo/f?p=222 (click on search records). The degree to which a given holding has been catalogued varies but this is a good place to at least begin to ascertain what holdings there are there and what their status is. For example searching on American Samoa as country provides a list of holdings most of which are hard copy only.

Example search results for American Samoa

For those interested in aspects of data rescue, this is likely to be a useful tool to ascertain whether NCDC hold any relevant records. By reasonable estimates at least as much data exists in hard copy / imaged format as has been digitised for the pre-1950 period. That is a lot of unknown knowns and could provide such rich information to improve understanding ...

Wednesday, November 26, 2014

A set of flyers for promoting the initiative's aims and outcomes

We have produced a set of one-sider flyers to promote the initiative and its aims and to try to engender additional inputs, collaborations and contributions. These will be taken by Kate Willett to the forthcoming COP meeting in Peru next month.

We strongly encourage use of these flyers at appropriate venues to support the further advancement of our work.

The set of flyers can be found at http://www.surfacetemperatures.org/promotional_materials. There are flyers on:

[links are to pdf versions]

Our more eagle eyed readers would have noted above a new strand to our work. I am delighted to say that we have, following the most recent steering committee call, formally recognized the efforts led by Victor Venema and Renate Auchmann to populate and exploit a database of parallel measurements by instigating a new expert team under the databank working group. We shall do all we can to support this important effort and in the first instanace we encourage readers to help us in the identification and collection of such holdings.

A stub page is available at http://www.surfacetemperatures.org/databank/parallel_measurements which we shall populate over the coming months. In the meantime more information on this effort can be found at http://variable-variability.blogspot.no/2014/08/database-with-parallel-climate-measurements.html.

Wednesday, November 5, 2014

Release of a daily benchmark dataset - version 1

The ISTI benchmark working group includes a PhD student looking at benchmarking daily temperature homogenisation algorithms. This largely follows the concepts laid out in the benchmark working group's publication. Significant progress has been made in this field. This post announces the release of a small daily benchmark dataset focusing on four regions in North America. These regions can be seen in Figure 1.

Figure 1 Station locations of the four benchmark regions. Blue stations are in all worlds. Red stations only appear in worlds 2 and 3.

These benchmarks have similar aims to the global benchmarks that are currently being produced by the ISTI working group, namely to:

Assess the performance of current homogenisation algorithms and provide feedback to allow for their improvement
Assess how realistic the created benchmarks are, to allow for improvements in future iterations
Quantify the uncertainty that is present in data due to inhomogeneities both before and after homogenisation algorithms have been run on them

A perfect algorithm would return the inhomogeneous data to their clean form – correctly identifying the size and location of the inhomogeneities and adjusting the series accordingly. The inhomogeneities that have been added will not be made known to the testers until the completion of the assessment cycle – mid 2015. This is to ensure that the study is as fair as possible with no testers having prior knowledge of the added inhomogeneities.

The data are formed into three worlds, each consisting of the four regions shown in Figure 1. World 1 is the smallest and contains only those stations shown in blue in Figure 1, Worlds 2 and 3 are the same size as each other and contain all the stations shown.

Homogenisers are requested to prioritise running their algorithms on a single region across worlds instead of on all regions in a single world. This will hopefully maximise the usefulness of this study in assessing the strengths and weaknesses of the process. The order of prioritisation for the regions is Wyoming, South East, North East and finally the South West.

This study will be more effective the more participants it has and if you are interested in participating please contact Rachel Warren (rw307 AT exeter.ac.uk). The results will form part of a PhD thesis and therefore it is requested that they are returned no later than Friday 12th December 2014. However, interested parties who are unable to meet this deadline are also encouraged to contact Rachel.

There will be a further smaller release in the next week that is just focussed on Wyoming and will explore climate characteristics of data instead of just focusing on inhomogeneity characteristics.

Monday, October 6, 2014

A framework for benchmarking of homogenisation algorithm performance on the global scale - Paper now published

The ISTI benchmarking working group have just had their first benchmarking paper accepted at Geoscientific Instrumentation, Methods and Data Systems:

Willett, K., Williams, C., Jolliffe, I. T., Lund, R., Alexander, L. V., Brönnimann, S., Vincent, L. A., Easterbrook, S., Venema, V. K. C., Berry, D., Warren, R. E., Lopardo, G., Auchmann, R., Aguilar, E., Menne, M. J., Gallagher, C., Hausfather, Z., Thorarinsdottir, T., and Thorne, P. W.: A framework for benchmarking of homogenisation algorithm performance on the global scale, Geosci. Instrum. Method. Data Syst., 3, 187-200, doi:10.5194/gi-3-187-2014, 2014.

Benchmarking, in this context, is the assessment of homogenisation algorithm performance against a set of realistic synthetic worlds of station data where the locations and size/shape of inhomogeneities are known a priori. Crucially, these inhomogeneities are not known to those performing the homogenisation, only those performing the assessment. Assessment of both the ability of algorithms to find changepoints and accurately return the synthetic data to its clean form (prior to addition of inhomogeneity) has three main purposes:

1) quantification of uncertainty remaining in the data due to inhomogeneity
2) inter-comparison of climate data products in terms of fitness for a specified purpose
3) providing a tool for further improvement in homogenisation algorithms

Here we describe what we believe would be a good approach to a comprehensive homogenisation algorithm benchmarking system. Thfis includes an overarching cycle of: benchmark development; release of formal benchmarks; assessment of homogenised benchmarks and an overview of where we can improve for next time around (Figure 1).

Figure 1 Overview the ISTI comprehensive benchmarking system for assessing performance of homogenisation algorithms. (Fig. 3 of Willett et al., 2014)

There are four components to creating this benchmarking system.

Creation of realistic clean synthetic station data

Firstly, we must be able to synthetically recreate the 30000+ ISTI stations such that they have the correct variability, auto-correlation and interstation cross-correlations as the real data but are free from systematic error. In other words, they must contain a realistic seasonal cycle and features of natural variability (e.g., ENSO, volcanic eruptions etc.). There must be a realistic persistence month-to-month in each station and geographically across nearby stations.

Creation of realistic error models to add to the clean station data

The added inhomogeneities should cover all known types of inhomogeneity in terms of their frequency, magnitude and seasonal behaviour. For example, inhomogeneities could be any or a combination of the following:

     - geographically or temporally clustered due to events which affect entire networks or regions (e.g. change in observation time);
     - close to end points of time series;
     - gradual or sudden;
     - variance-altering;
     - combined with the presence of a long-term background trend;
     - small or large;
     - frequent;
     - seasonally or diurnally varying.

Design of an assessment system

Assessment of the homogenised benchmarks should be designed with the three purposes of benchmarking in mind. Both the ability to correctly locate changepoints and to adjust the data back to its homogeneous state are important. It can be split into four different levels:

     - Level 1: The ability of the algorithm to restore an inhomogeneous world to its clean world state in terms of climatology, variance and trends.

     - Level 2: The ability of the algorithm to accurately locate changepoints and detect their size/shape.

     - Level 3: The strengths and weaknesses of an algorithm against specific types of inhomogeneity and observing system issues.

     - Level 4: A comparison of the benchmarks with the real world in terms of detected inhomogeneity both to measure algorithm performance in the real world and to enable future improvement to the benchmarks.

The benchmark cycle

This should all take place within a well laid out framework to encourage people to take part and make the results as useful as possible. Timing is important. Too long a cycle will mean that the benchmarks become outdated. Too short a cycle will reduce the number of groups able to participate.

Producing the clean synthetic station data on the global scale is a complicated task that has now taken several years but we are close to completion of a version 1. We have collected together a list of known regionwide inhomogeneities and a comprehensive understanding of the many many different types of inhomogeneities that can affect station data. We have also considered a number of assessment options and decided to focus on levels 1 and 2 for assessment within the benchmark cycle. Our benchmarking working group is aiming for release of the first benchmarks by January 2015.

Friday, September 12, 2014

The Databank Near Real Time Update System

Since the official release back in June, we have worked to keep the databank updated with the most recent data. Each month we will post new data from sources that update in near-real-time (NRT), along with an updated version of the recommended merge with the latest data appended. Stage 1 data (digitized in its original form) will be updated no later than the 5th of each month, and then Stage 2 (common formatted data) and Stage 3 (merged record) data will be updated no later than the 11th of the month.

So what data gets updated in our NRT system? We have determined four sources that have updated data within the first few days of the month. They are the CLIMAT streams from NCDC as well as the UK, the unpublished form of the monthly climatic data for the world (MCDW) and finally GHCN-D. Similar to the merge program, a hierarchy is placed determining which source its data appends to if there are conflicts. The hierarchy is here:

1) GHCN-D
2) CLIMAT-UK
3) CLIMAT-NCDC
4) MCDW-Unpublished

An overview of the system is shown here in this flow diagram (Click on image to enlarge):

The algorithm to append data looks for station matches through the same metadata tests as described in the merge program. These include geographic distance, height distance, and station name similarity using the Jaccard Index. If the metadata metric is good, then an ID test is used to determine station match. Because the four input sources have either a GHCN-D or WMO ID, the matching is much easier here than in the merge program. Once a station match is found, new data from the past few months are appended. Throughout this process, no new stations are added.

We have had two monthly updates so far. As always the latest recommended merge data can be found on our ftp page here, along with older data placed in the archive here. Note that we are only updating the recommended merge, and not the variants. In addition, the merge metadata is not updated, because no new merge has been applied yet. We plan to have another merge out sometime in early 2015.

Friday, August 29, 2014

ccc-gistemp and ISTI

This is a guest post by David Jones of the Climate Code Foundation. It is a mirror of their post at http://climatecode.org/blog/2014/08/ccc-gistemp-and-isti/

ccc-gistemp is Climate Code Foundation‘s rewrite of the NASA GISS Surface Temperature Analysis GISTEMP. It produces exactly the same result, but is written in clear Python.

I’ve recently modified ccc-gistemp so that it can use the dataset recently released by the International Surface Temperature Initiative. Normally ccc-gistemp uses GHCN-M, but the ISTI dataset is much larger. Since ISTI publish the Stage 3 dataset in the same format as GHCN-M v3 the required changes were relatively minor, and Climate Code Foundation appreciates the fact that ISTI is published in several formats, including GHCN-M v3.

The ISTI dataset is not quality controlled, so, after re-reading section 3.3 of Lawrimore et al 2011, I implemented an extremely simple quality control scheme, MADQC. In MADQC a data value is rejected if its distance from the median (for its station’s named month) exceeds 5 times the median absolute deviation (MAD, hence MADQC); any series with fewer than 20 values (for each named month) is rejected.

So far I’ve found MADQC to be reasonable at rejecting the grossest non climatic errors.

Let’s compare the ccc-gistemp analysis using the ISTI Stage 3 dataset versus using the GHCN-M QCU dataset. The analysis for each hemisphere:

For both hemispheres the agreement is generally good and certainly within the published error bounds.

Zooming in on the recent period:

Now we can see the agreement in the northern hemisphere is excellent. In the southern hemisphere agreement is very good. The trend is slightly higher for the ISTI dataset.

The additional data that ISTI has gathered is most welcome, and this analysis shows that the warming trend in both hemispheres was not due to choosing a particular set of stations for GHCN-M. The much more comprehensive station network of ISTI shows the same trends.

Thursday, July 24, 2014

The WMO Commission for Climatology meeting and developments on the WMO front

The World Meteorological Organization’s Commission for Climatology had its four-yearly meeting in Heidelberg, Germany, from 3-8 July, preceded by a Technical Conference from 30 June – 2 July. The Commission is the central body for climate-related activities in WMO, and has a major role in establishing international standards and setting international work programs in the climate field, particularly through setting up networks of Expert Teams and Task Teams to work on particular issues. Its President (re-elected at the meeting) is Tom Peterson of NCDC, who will be well-known to many of you. The International Surface Temperature Initiative was set up as the result of a resolution of the last Commission for Climatology meeting, in 2010.

I made a presentation to the Technical Conference on the current status of ISTI. By happy coincidence, this presentation was scheduled for the morning on 1 July, a few hours after the release of the first version of the ISTI databank. The presentation appeared to be well-received; there were few direct questions or follow-ups, but the pile of leaflets we brought describing ISTI (once they got there, after a couple of bonus days enjoying Berlin with the rest of my luggage) was a lot smaller at the end of the week than it was at the start. One particular reason for targeting the Commission audience is that many of the attendees at Commission meetings are senior managers in their national meteorological services (often the head of the climate division, or equivalent), and so potentially have more influence over decisions to make data available to projects such as ISTI than individual scientists would.

Slow progress is also being made in two other areas of WMO of interest to ISTI. The inclusion of at least some historic climate data amongst the set of products which countries agree to freely exchange has been a long-standing goal of ours. The key decisions on this will be made at the full WMO Congress, which will be held next year, but progress to date (including through the recent WMO Executive Council meeting) is encouraging. There are also moves to include the month’s daily data in monthly CLIMAT messages, which are the principal means of exchanging current climate data through the WMO system but currently only contain monthly data. This will be very useful for the ongoing updating of data sets, as it will make daily data available which can be assumed to be for a full 24-hour day and is likely to have received at least some quality control (neither of which is necessarily true for the real-time synoptic reports which are the primary current source of recent daily and sub-daily data). Considerable technical work remains to be done, though, to implement this, even once it is formally endorsed.

Data rescue and climate database systems continue to be a high priority of the Commission, with several initiatives outlined at the meeting. Among them are proposals for an international data rescue portal, which (among other things) would potentially facilitate crowd-sourced digitisation. It is, however, an indication of how much work still remains to be done in many parts of the world that, according to results of a survey reported at the meeting, 25% of responding countries still stored their country’s climate data in spreadsheets or flat files, and 40% had a climate database system which was not fully functioning or not functioning at all.

The Commission also agreed to establish a new Task Team on Homogenisation. The full membership (and chairing) of this group are not yet clear but I will almost certainly be part of it. This team will be working closely with ISTI, but will also have a major focus on supporting the implementation of homogenised data sets which contribute to operational data products nationally and internationally.

Also of interest to ISTI is a new WMO initiative to formally recognise “centennial stations”, which, as the name implies, are stations which have existed with few or no changes for 100 years or more. Countries are to be asked to identify such stations, whose data will clearly be of considerable value to ISTI, if not already part of our databank. Free access to data and relevant metadata are among the recommendations for centennial stations.

And one advantage of holding an international meeting during the World Cup: it provides an instant conversation-starter with delegates of almost any country. (Perhaps fortunately for the Brazilian delegation, the meeting finished just before the semi-finals).

(Update 5 August: the resolution which came out of the WMO Executive Council meeting is available at

https://docs.google.com/a/noaa.gov/file/d/0B8DhC1GSWSmxbUs1c2UtOGZpeEk/edit).

Monday, July 21, 2014

Later talks from SAMSI / IMAGe workshop

Following up to the previous post there were two further recorded talks at the event.

http://video.ucar.edu/mms/image/samsi2014_jonathan_woody.mp4 - Jonathan Woody gave a talk on analyses of snow depth.

http://video.ucar.edu/mms/image/samsi2014_bo_li.mp4 - Bo Li provided a talk on model selection in the use of palaeodata analyses.

We hope to have a meeting report out within a matter of days to weeks. We will post this here.

Overall there was a lot of active participation and many new directions to be taken in the analysis of surface temperatures. Our thanks go out to both SAMSI and IMAGe for facilitating this meeting and to all the participants for being active. More details to appear soon ...

Friday, July 11, 2014

Talks from SAMSI / IMAGe workshop on International Surface Temperature Intiative

We are currently in-situ in Boulder at a workshop organized with SAMSI and IMAGe. Work is ongoing and a formal write up will follow after completion. Most, but not all, of the talks have been streamed and are available for viewing.

http://video.ucar.edu/mms/image/samsi2014_richard_smith.mp4 - Richard Smith provided an overview of the SAMSI program and their expectations for the workshop.

http://video.ucar.edu/mms/image/samsi2014_peter_thorne.mp4 - I provided an overview of the ISTI program and progress to date

http://video.ucar.edu/mms/image/samsi2014_peter_thorne_jared.mp4 - I deputized for Jared to provide an overview of the databank process

http://video.ucar.edu/mms/image/samsi2014_kate_willet.mp4 - Kate Willett provided an overview of progress with creation of benchmarks and remaining challenges.

http://video.ucar.edu/mms/image/samsi2014_lucie_vincent.mp4 - Lucie Vincent provided an overview of typical inhomogeneities found in station timeseries and some of their likely causes.

http://video.ucar.edu/mms/image/samsi2014_jaxk_reeves.mp4 - Jaxk Reeves provided an overview of at most one changepoint techniques.

http://video.ucar.edu/mms/image/samsi2014_colon_gallagher.mp4 - Colin Gallagher provided an overview of fitting regression models.

http://video.ucar.edu/mms/image/samsi2014_robert_lund.mp4 - Robert Lund provided an overview of multiple changepoint techniques.

http://video.ucar.edu/mms/image/samsi2014_enric_aguilar.mp4 - Enric Aguilar and http://video.ucar.edu/mms/image/samsi2014_victor_venema.mp4 Victor Venema provided an overview of several state of the art climate homogenization techniques.

http://video.ucar.edu/mms/image/samsi2014_matt_menne.mp4 - Matt Menne provided and overview of the Pairwise Homogenization Algorithm and Bayes Factor Analyses by NCDC and their benchmarking.

http://video.ucar.edu/mms/image/samsi2014_peter_thorne2.mp4 - I provided an overview of uncertainty quantification in climate datasets.

http://video.ucar.edu/mms/image/samsi2014_colin_morice.mp4 - Colin Morice provided an overview of the HadCRUT4 uncertainty estimation techniques.

http://video.ucar.edu/mms/image/samsi2014_doug_nychka.mp4 - Doug Nychka provided an overview of spatial statistical aspects.

http://video.ucar.edu/mms/image/samsi2014_jeff_whitaker.mp4 - Jeff Whitaker provided an overview of comparisons between surface temperature products and dynamical reanalyses driven solely by observed SSTs and surface pressure measurements.

http://video.ucar.edu/mms/image/samsi2014_enric_aguilar2.mp4 - Enric Aguilar gave a talk on problems in some typical data sparse non-N. American / European series.

http://video.ucar.edu/mms/image/samsi2014_finn_lindgren.mp4 - Finn Lindgren provided a talk on spatial statistical aspects.

Monday, June 30, 2014

Global Land Surface Databank: Version 1.0.0 Release

The International Surface Temperature Initiative is pleased to release version 1 of a new monthly dataset that brings together new and existing sources of surface air temperature. Users are provided a way to more completely track the origin of surface air temperature data from its earliest available source through its integration into a merged data holding. The data are provided in various stages that lead to the integrated product.

This release is the culmination of three years effort by an international group of scientists to produce a truly comprehensive, open and transparent set of fundamental monthly data holdings. The databank has been previously available in beta form, giving the public a chance to provide feedback. We have received numerous comments and have updated many of our sources.

This release consists of:

Over 50 distinct sources, submitted to the databank to date in Stage 0 (hardcopy / image; where available), Stage 1 (native digital format), and Stage 2 (converted to common format and with provenance flags).
All code to convert the Stage 1 holdings to Stage 2.
A recommended merged product and several variants which have all been built off the Stage 2 holdings. 2 ASCII formats are provided (ISTI format, GHCN format), along with a CF Compliant netCDF format.
All code used to process the data merge, along with statistical auxiliary files.
Documentation necessary to understand at a high level the processing of the data, including the location of the manuscript published in Geoscience Data Journal.

The entire databank can be found here and the merged product is located here. Earlier betas are also found here. Because the databank is version controlled, we welcome any feedback. We will be providing updates on the blog regarding any new releases.

For more information, please visit our website: www.surfacetemperatures.org
General Comment? Please email general.enquiries@surfacetemperatures.org

Location of all stations in the recommended version of the Stage Three component of the databank. The color corresponds to the number of years of data available for each station.

Station count of the recommended merge by year from 1850-2010. Databank stations in red compared to GHCN-M, version 3 in black.

Saturday, June 28, 2014

Understanding the effects of changes in the temperature scale standards through time

Since records of surface temperature started being made there have been iterations of the fixed points standards used by national metrological institutes (that is not a typo). Assuming that all meteorological measurements through time have been made to such standards (which may be a considerable stretch) this would have imparted changes to the records that are not physical in origin. As part of meteomet efforts have been made to understand this. It is a relatively small effect compared to effects of other long recognized data issues. Nevertheless it is important to properly and systematically consider all sources of potential biases as exhaustively as possible.

The work itself was led by Peter Pavlasek of the Slovak Institute of Metrology. His introduction is reproduced below:

Temperature is one of the main quantities measured in meteorology and plays a key role in weather forecasts and climate determination. The instrumental temperature recordings now spans well over a century, with some records extending back to the 17th century, and represents an invaluable tool in evaluating historic climatic trends. However, ensuring the quality of the data records is challenging, with issues arising from the wide range of sensors used, how the sensors were calibrated, and how the data was recorded and written down. In particular, the very definition of the temperature scales have evolved. While they have always been based on calibration of instruments via a series of material phase transitions (fixed points), the evolution of sensors, measuring techniques and revisions of the fixed points used has introduced differences that may lead to difficulties when studying historic temperature records. The conversion program here presented deals with this issue for 20th century data by implementing a proposed mathematical model to allow the conversion from historical scales to the currently adopted International Temperature Scale of 1990 (ITS-90). This program can convert large files of historical records to the current international temperature scale, a feature which is intended to help in the harmonisation processes of long historic series. This work is part of the project “MeteoMet” funded by the EURAMET, the European association of National Institutes of Metrology, and is part of a major general effort in identifying the several sources of uncertainty in climate and meteorological records.

Michael de Podesta, who has served on the steering committee since ISTI's inception, reviewed the software for ISTI and had the following summary:

Assuming that calibration procedures immediately spread throughout the world – homogenisation algorithms might conceivably see adjustments in 1968, with smaller adjustments in 1990.

If undetected, the effect would be to create a bias in the temperature record. This is difficult to calculate since the bias is temperature dependent, but if the mean land-surface temperature is ~10°C and if temperature excursions are typically ±10 °C then one might expect that the effect to be that records prior to 1968 were systematically overestimated by about 0.005 °C, and records between 1968 and 1990 by about 0.003 °C.

Michael's full summary which includes some graphical and tabular summaries can be found here.

The code package is a windows operating system based package. It is available here.

Wednesday, June 4, 2014

Paper describing benchmarking concepts in OA review

Just briefly to note that a discussion paper is now open for comment authored by the members of the benchmarking working group. This paper discusses the concepts and frameworks that will underpin all aspects of the benchmarking and assessment exercise. Its open to review until July 30th. Please do, if you have time and inclination, pop along and have a read and provide a constructive (!) review. The discussion site is at http://www.geosci-instrum-method-data-syst-discuss.net/4/235/2014/gid-4-235-2014.html .

Also, watch this space at the end of this month for exciting developments on the first pillar of the ISTI framework - the databank.

Finally, we are rapidly hurtling towards the SAMSI/IMAGe/ISTI workshop on surface temperatures and their analyses. Its going to be a busy few weeks so expect this blog to be somewhat less moribund than of late ...