In this post, I want to take up a particularly striking aspect of the Dai paper (and some other earlier papers on drought). To do so, though, I first need to fill readers in a little bit on a beautiful piece of statistical machinery called Principal Component Analysis. We will not delve into the mathematics of it here - I refer you to the Wiki page which will be inscrutable to non-technical readers, but a good foundation for the mathematically inclined. However readers need to get at least a little intuition for what principal components mean, and I will try to supply that.
Suppose we have some data, and the data has more than one dimension. The simplest case is points in a plane - two dimensional data. Suppose the data form a cloud like the fuzzy grey dots in this picture:
So the long arrow here is known as the first principal component, and the short arrow is the second principal component. If you have the value of both components (for a given point), you know everything about where the point is (you can recreate the original x and y values - I'll spare you the math). But these are more interesting directions to think about this particular data set in: in some sense they capture the natural features of the data.
Now, you can do the same thing in three dimensions. Googling around, I couldn't find a really great illustration, but this one (from here) will have to serve:
Now, we humans with our monkey++ brains can only think in three dimensions. But computers don't care, and if you let them run long enough, they will happily crunch through data in four, five, or in general N dimensions. And so Principal Component Analysis - aka PCA - has been used for a huge variety of problems that are trying to look for significant patterns in data - finding "average" faces, genetic analysis, etc, etc. Indeed, one of my youthful research forays was exploring the possibility of using this technique in tracking Internet malfeasors down (a paper which is now up to 228 citations, pretty good for a technique that I now judge to be completely impractical).
In any case, now imagine taking the world and dividing it up into square cells, 2.5o on a side, and computing the Palmer Drought Severity Index for each square, in a particular year. Consider that collection of numbers for that particular year to be one data point in a very high dimensional space (the number of dimensions being equal to the number of 2.5o cells required to cover the world - 1296, except we'd probably economize somehow near the poles). Now, if we have all the years from 1900 to 2008, that forms a cloud of 109 points in this very high dimensional space. And the computer will cheerfully crunch the mathematical algorithm to produce the principal components for that data cloud (it's called diagonalizing the covariance matrix, but you don't need to know that to appreciate what it's doing). And so it will figure out the direction of the first principal component, the second principal component, and the other 1294 principal components (though after the first few, things are going to get noisy as hell with only 109 points in 1296 dimensions, but never mind that). And it will tell us how big each principal component is (how preferentially the data tends to line up along that particular direction - ie how long and thin the data cloud looks in that particular direction).
Ok, but if we have a "direction" in a 1296 dimensional space, what the hell do we do with that? Well, we map it back into the original grid cells, and show a map that has a different color for how much a given cell PDSI aligns with a particular principal component.
Now, before I show you the result of that, I wish to stress that this technique is a purely statistical technique for number-crunching. The algorithm is just thinking data points in a high-dimensional space - it doesn't know anything about temperatures or rainfall or clouds or droughts or carbon dioxide. In short, there is no actual climate physics in it whatsoever. In no sense is this a climate model - just a statistical technique for analyzing the existing observations.
Ok, so here's what the first principal component looks like in Dai's paper:
This is the 1950-2008 trend map we saw yesterday. If you look at Australia, Scandinavia, Africa, you'll see the pattern is very similar. The US is a bit different, in that the PCA first component finds significantly more southwest drought than the trend. We'll get to that in a minute. But otherwise, these two maps are qualitatively similar.
So in a sense, that PCA algorithm seems to be saying "the most important thing that I'm finding going on in this data is that there is a trend toward drought in thus and so places". And if we now look at how much of this particular principal component there is in each year (how far along the long arrow each grey dot is), we get this graph:
So that's the single most important thing going on in the data (and that PC1, 7.1% at the top of the graph essentially says that this particular direction accounts for 7.1% of the total variation in the data). Now, what about the second principal component? In a way, that's even more fascinating:
Ok, a pattern of wetness in the southwestern US, dryness in Brazil, Indonesia, Australia and Southern Africa. What is this? How about this description:
These are from the Wikipedia page on the El-Nino Southern Oscillation. And if you stare at the map above while reading it, I think you'll agree that the description matches very well with the second principal component map. And indeed, if you look at the temporal pattern of how much the years line up with this El-Nino like pattern, you get this:
The black line is the amount of the second principal component in the PDSI data. The red line is a commonly used indicator of how strong the El-Nino is (it's actually the atmospheric pressure difference between Darwin Australia and Tahiti), only shifted by six months. You can see that there is a pretty good degree of agreement between these lines - they aren't exactly the same, but clearly are capturing qualitatively the same phenomenon.
As a scientist, I find this amazing. You stick in this PDSI data to a completely physics-agnostic statistical algorithm, and it comes out with "The most important thing going on here is an overall trend to drying in certain places" that roughly matches what climate models say global warming will do, and then it adds "And the second most important thing going on is this oscillation back and forth between drought in some other places" which very closely match the well known El-Nino phenomenon.
So let's match up three things now: the picture of where climate models say drought will be (top), the picture of the first principal component (middle), and the picture of the drying trend of the last 60 years (bottom):
Stare and stare. Now, there's a great deal of uncertainty here. The climate models are clearly not perfect. Extrapolating existing trends could completely fail to foresee non-linear reorganizations of the climate. But suppose we had to bet (and really, we do, don't we?). I'd make two points: the places that are very dry in all three pictures don't look to me like a great bet (that would be you, Mediterranean and eastern Australian famers). And in the regions where there is uncertainty, my instinct would be to bet on the middle picture as capturing the best idea of regional drought patterns.
The differences between the second and the third picture are particularly interesting in North America. As you can see, the main difference is that the PCA says the southwestern US is drying, but the trend picture shows that much more weakly. And in the comment I referenced yesterday:
Some of these regions, such as the United States, have fortunately avoided prolonged droughts during the last 50 years mainly due to decadal variations in ENSO and other climate modes, but people living in these regions may see a switch to persistent severe droughts in the next 20–50 years, depending on how ENSO and other natural variability modulate the GHG-induced drying.Dai is basically saying parts of the US have been protected by the fact that there's been a trend towards more El-Nino from 1950 to the early 1990s - with the resulting wet southwestern winter tendency offsetting the global warming signal - but it's not clear that will continue. And indeed, it hasn't been so much so in the last ten years and there's been a lot of really nasty western wildfires, as well as epic beetle outbreaks:
Having recently left California, I guess I have voted with my feet.
I still have the bit between my teeth on this drought stuff. Tomorrow I'll try to take a deeper look at the PDSI and how much confidence it's reasonable to have in it as an indicator.
Note: This post is part of the Future of Drought Series on Early Warning.