Google Flu Trends


More On Google Flu Trends: Brilliant Predictor Or Cautionary Tale

In case you missed this excellent post on whether ‘Google Flu Trends’ is prescient or wrong, you’ve got a second chance to hear the details today on Radio Boston.

The segment features MIT computer science graduate student Keith Winstein (and my former colleague at The Wall Street Journal) exploring what might account for the dramatic divergence between Google’s flu data and the official CDC flu numbers. “This could be a cautionary tale about the perils of relying on these “Big Data” predictive models in situations where accuracy is important,” Winstein said in an interview with CommonHealth.

Here are some more of his thoughts:

The issue that’s interesting from the computer science perspective is this: Google Flu Trends launched to much fanfare in 2008 — it was even on the front page of the New York Times — with this idea that, as the head of said at the time, they could out-perform the CDC’s very expensive surveillance system, just by looking at the words that people were Googling for and running them through some statistical tools.

It’s a provocative claim and if true, it bodes well for being able to track all kinds of things that might be relevant to public health. Google has since launched Flu Trends sites for countries around the world, and a dengue fever site.

So this is an interesting idea, that you could do public health surveillance and out-perform the public health authorities [which use lab tests and reports from ‘sentinel’ medical sites] just by looking at what people were searching for.

‘It is often a problem with computers that they only tell us things we already know.’
Google was very clear that it wouldn’t replace the CDC, but they have said they would out-perform the CDC. And because they’re about 10 days earlier than the CDC, they might be able to save lives by directing anti-viral drugs and vaccines to afflicted regions.

And their initial paper in the journal Nature said the Google Flu Trends predictions were 97% accurate…

That was astounding. However, it is often a problem with computers that they only tell us things we already know. When you give a computer something unexpected, it does not handle it as well as a person would.

Shortly after that report of 97% accuracy, we had that unexpected swine flu, which was a different time of year from the normal flu season, and it was different symptoms from normal, and so Google’s site didn’t work very well.

[Carey asks: And the accuracy went down to 20-something percent?]

To a 29 percent correlation, and it had just been 97 percent. So it was not accurate. Continue reading