MURFREESBORO, Tenn. (BURRISS) -- I remember reading in Isaac Asimov’s “Foundation” trilogy about a man who fancied himself to be an archeologist. But he thought all he had to do was read books to understand ancient history. No digging in the dirt and musty old ruins for him. Virtual archeology was good enough.
Today we call that kind of thing “data mining,” and if you listen to the so-called experts, you can find anything you want just by looking at, and then manipulating virtual data. No need to get your hands dirty in the real world. Just go to a database.
Well, during this year’s flu season Google ran statistics showing the number and severity of cases. Unfortunately, their numbers were about twice as high as the numbers from the Centers for Disease Control, and, it turns out, about twice as high as reality.
So where did Google get their numbers? They were an estimate based on the number of flu-related searches the service recorded. That’s right: Google told the world how many flu cases there were, based on how many people did an Internet search for the word “flu.” When the erroneous numbers were pointed out, Google officials merely said they had to tinker with their algorithm.
Somehow the algorithm failed to account for people who didn’t have the flu but were merely curious. Sort of like Homeland Security assuming everyone that searches for the word “bomb” is a terrorist.
But this wasn’t the first time Google got the numbers wrong. In 2009 the service badly underestimated the number of flu cases in what nearly turned out to be a global pandemic.
Fortunately, the CDC used real cases involving real patients and real doctors, and was thus able to get real numbers, not virtual guesses.
It’s an oft-repeated computer aphorism, garbage in, garbage out. Perhaps we ought to add, don’t try to fool reality.
I’m Larry Burriss.