If you were paying attention in an earlier post on characterizing the first version beta release you will have noted that the databank timeseries behavior is subtly different to that of the 'raw' GHCNv3.
The early period record is slightly cooler than the estimates from GHCNv3 while the last decade is warmer than GHCNv3. The net impact is to increase the apparent trend. This pattern is present in all the merge variants to a greater or lesser degree. This raises the logical question as to why this difference is arising. Is it because the databank's improved number of stations are sampling areas of the globe previously unsampled in GHCNv3 which behaved in a different manner to the restricted GHCNv3 sample from this larger whole or is it down to additional station sampling in areas already sampled by GHCNv3? And if so why? The two graphs below do the obvious thing and split it out simply by averaging over grids present in both and those only in the databank (there is a much smaller population of gridboxes present in v3 but not in the databank which would be grossly too small to have a significant material impact on global estimates being considered here).
With GHCNv3 gridbox sampling (concentrate on (spot the?) difference between red and blue)
New gridboxes.
So, most of the difference appears to relect better sampling regions already sampled. The question of why and what impact it has on homogenization efforts is 'future work' ... and is why we now need multiple groups to take up the challenge of creating new data products from the databank.
The early period record is slightly cooler than the estimates from GHCNv3 while the last decade is warmer than GHCNv3. The net impact is to increase the apparent trend. This pattern is present in all the merge variants to a greater or lesser degree. This raises the logical question as to why this difference is arising. Is it because the databank's improved number of stations are sampling areas of the globe previously unsampled in GHCNv3 which behaved in a different manner to the restricted GHCNv3 sample from this larger whole or is it down to additional station sampling in areas already sampled by GHCNv3? And if so why? The two graphs below do the obvious thing and split it out simply by averaging over grids present in both and those only in the databank (there is a much smaller population of gridboxes present in v3 but not in the databank which would be grossly too small to have a significant material impact on global estimates being considered here).
With GHCNv3 gridbox sampling (concentrate on (spot the?) difference between red and blue)
New gridboxes.
So, most of the difference appears to relect better sampling regions already sampled. The question of why and what impact it has on homogenization efforts is 'future work' ... and is why we now need multiple groups to take up the challenge of creating new data products from the databank.
I put the individual station GHCN and ISTI records into a gadget here. It makes it easier to compare stations with ISTI vs GHCN adjusted and unadjusted. It has a Google Maps interface for accessing stations.
ReplyDeleteIt helps too to see where GHCN has gaps that ISTI might be filling, as you suggest.
Thanks Nick. This looks really interesting. We'll take a look as we count down to coming out of beta. The visualization aspects may help us to further refine some of the processes and tests.
Delete