Google in blue, CDC in red. Note the dramatic divergence toward 2013. (Keith Winstein, MIT)
Has Google’s much-celebrated flu estimator, Google Flu Trends, gotten a bit, shall we say, over-enthusiastic?
Last week, a friend commented to Keith Winstein, an MIT computer science graduate student and former health care reporter at The Wall Street Journal: “Whoa. This flu season seems to be the worst ever. Check out Google Flu Trends.”
Hmmm, Winstein responded. When he checked, he saw that the official CDC numbers showed the flu getting worse, but not nearly at Google’s level. (See the graph above.) The dramatic divergence between the Google data and the official CDC numbers struck him: Was Google, he wondered, prescient or wrong?
He began to explore — as much as a heavy grad-student schedule allows — and shares his thoughts here. Our conversation, lightly edited:
I accept the caveat that these predictive algorithms are not your speciality, but still, from highly informed, casual observation, what are you seeing, in a highly preliminary sort of way?
Well, I’m certainly not an expert on the flu. The issue that’s interesting from the computer science perspective is this: Google Flu Trends launched to much fanfare in 2008 — it was even on the front page of the New York Times — with this idea that, as the head of Google.org said at the time, they could out-perform the CDC’s very expensive surveillance system, just by looking at the words that people were Googling for and running them through some statistical tools.
It’s a provocative claim and if true, it bodes well for being able to track all kinds of things that might be relevant to public health. Google has since launched Flu Trends sites for countries around the world, and a dengue fever site.
So this is an interesting idea, that you could do public health surveillance and out-perform the public health authorities [which use lab tests and reports from ‘sentinel’ medical sites] just by looking at what people were searching for.
‘It is often a problem with computers that they only tell us things we already know.’
Google was very clear that it wouldn’t replace the CDC, but they have said they would out-perform the CDC. And because they’re about 10 days earlier than the CDC, they might be able to save lives by directing anti-viral drugs and vaccines to afflicted regions.
And their initial paper in the journal Nature said the Google Flu Trends predictions were 97% accurate…
That was astounding. However, it is often a problem with computers that they only tell us things we already know. When you give a computer something unexpected, it does not handle it as well as a person would.