I caught up with Met Office data visualiser extraordinaire (and old chum), Neil Kaye, to talk about the process of making his new plot on visualising global temperatures, and why he made certain decisions. Here is the plot:
Doug: What gave you the idea for the plot, what was the inspiration?
Neil: Originally the idea was balloons because I had the idea that the higher up they were the larger they would be. It was also going to be an animation of them taking off into the atmosphere year by year. However for testing I used circles.
Then I thought the idea would work if it were called bubbles. We now might have the bubble coming out of a hole somewhere at the bottom. The idea for the bubble plot of temperature data came from a David McCandless plot on the strength of scientific evidence for popular health supplements, called “Snake Oil Supplements?”
It occurred to me that a similar idea could be used to look at global temperature.
It also happened that the Met Office wanted a good way to visualise global temperature following on from the success of Ed Hawkins “Spiralling global temperatures” [Editor’s note: Not only did Ed’s Spiral go Viral, but a suspiciously similar plot was shown at the opening ceremony of the Rio Olympic Games].
Doug: What were you trying to get across?
Neil: I think [the plot] is very effective at showing the recent warming. One reason I used bubbles is that perhaps they are more engaging than bars.
Doug: Can you talk through some of the the details that you’ve had to make decisions about?
Neil: The plot is created using d3 using the scatterplot code as a basis.
I used transparency so you could see where the overlaps were and also read the year value beneath other bubbles.
The year label is also being used to label the median value of the data point. I purposely made the more recent years appear in larger text for emphasis. The colour scale is colour blind proof and is adapted from the LinearL colour scheme (I took the central three quarters of the scale).
I used my climpal software to generate 6 RGB values for the scale having, removed the both ends of the scale.
Doug: Where can people get the data, if they want to make their own plots?
Neil: The temperature data is the HadCRUT4 near surface temperature data. It is the Annual Global temperature. I use the widest confidence intervals to display the bubble size:
Columns 11 and 12 are the lower and upper bounds of the 95% confidence interval of the combined effects of all the uncertainties described in the HadCRUT4 error model (measurement and sampling, bias and coverage uncertainties).
Following on from advice from John Kennedy I lagged the data by 4 months, so for example the value for 1998 was a monthly average from September 1997 to August 1998.
I updated the plot following a suggestion from Kate Willet to use the X-Axis for ENSO, despite more overlapping I think it works very well:
For El Nino data I have used the Southern Oscillation Index (SOI), defined as the normalized pressure difference between Tahiti and Darwin. I used the file soi.dat obtained from the UEA Climatic Research Unit.
Tellingly, the conversation did not consider whether the visualization conveys an accurate representation of the data.
It’s not a bad point, but no visualisation perfectly represents data. Until we find a way of doing that, we’ll have to trade off representation, engagement, dynamics, simplicity and complexity, etc.
I think this offers a nice, new approach.
OK.
Overall, it looks nice. Part of me really wants to see an animated version of this where the bubbles bubble up from the bottom and jostle together as bubbles do. But that’s not the part of my brain that you’d trust with anything as serious as plotting global mean temperature.
I don’t like the bubbles*. The radius is associated with the uncertainty, but the uncertainty is only meaningful in one direction. If the x-axis turns out to be time, then you’ll really need to rethink the bubbles. Plus, there are generic problems with areas vs lengths for interpretation of figures.
The colours represent wildly different spans of time. And the labels on the right seem to put years in the wrong category e.g. 2000 should be on the bluer side of the divide.
There’s no indication of units on the second plot and only a vague indication of meaning.
I like that you can see the relationship between the El Nino/La Nina axis and global temperature.
*in this specific case. I generally love bubbles.
Wouldn’t it be better if the bubble’s area was proportional with uncertainty, versus the radius, so that at a glance we can visually compare uncertainties?
So the area of the bubble represents a temp? Or the diameter?
And the answer! NEITHER.
animated bubbles would be great! cf gapminder: https://www.gapminder.org/tools/#_data_/_lastModified:1521497556000;&chart-type=bubbles
I agree that visually we process areas better than radii, so the uncertainty should be proportional to $\sqrt(r)$, not $r$