big data


Researchers Use Big Data To Seek ‘Unique Fingerprint’ Of Long-Term Lyme Disease Symptoms

In this 2014 file photo, an informational card about ticks distributed by the Maine Medical Center Research Institute is seen in the woods in Freeport, Maine. (Robert F. Bukaty/AP)

In this 2014 file photo, an informational card about ticks distributed by the Maine Medical Center Research Institute is seen in the woods in Freeport, Maine. (Robert F. Bukaty/AP)

By Richard Knox

One of the hottest fashions in science these days is Big Data: the idea that revelations can be teased out from great masses of information. Now, some researchers are using the strategy to pry open the black box of Lyme disease.

Four decades after the tick-borne infection first came to light in the vicinity of Lyme, Connecticut, the small world of Lyme-focused researchers isn’t even close to understanding why the disease often seems to plague its victims with disabling immunologic and neurologic problems that can persist for years.

Dr. John Aucott thinks Big Data can change that. “We’re really embarking on a new stage — a new era,” says Aucott, director of the year-old Lyme Disease Clinical Research Center at Johns Hopkins University School of Medicine.

The first step is to show that chronic Lyme disease “is a real illness,” Aucott says. “Many people don’t believe it exists because there’s no objective underpinning.”

That is, there’s no diagnostic test — a biological marker that’s present in people who suffer from chronic Lyme disease symptoms and absent in others. Consequently, the disorder widely called “chronic Lyme disease” is a grab bag of a diagnosis — and probably not one singular disorder.

“Chronic Lyme usually refers to a very heterogeneous population with nonspecific ailments,” Aucott says. “Some may be related to Lyme, others to other tick-borne infections or illnesses we can’t define accurately.”

Aucott and his colleagues have just published some of the first Big Data-derived evidence in the journal mBio. They’ve found a set of activated genes in immune cells of patients newly infected with the Lyme disease bacterium, compared to similar people without Lyme.

“[The study results] may finally start cracking the mystery of why people fail therapy.”

– Dr. Harriet Kotsoris, Global Lyme Alliance

Intriguingly, some of these genes were still activated six months later, even among patients with verified Lyme disease who were successfully treated with antibiotics. Some of these genes overlapped with those activated in autoimmune diseases such as lupus and arthritis — a hint that the Lyme disease bacterium can have a lasting effect on the immune system, a leading hypothesis that has lacked concrete evidence until now.

Dr. Harriet Kotsoris, chief science officer of the Global Lyme Alliance, says the results are provocative. “It may finally start cracking the mystery of why people fail therapy and give us an insight of the genetic makeup of post-treatment Lyme disease syndrome,” she says. “And importantly, it may offer a diagnostic profile.”

One big problem is that patients who believe they have chronic Lyme disease can test negative for antibodies for the infection. That doesn’t mean they weren’t infected, but it does leave them in diagnostic limbo.

To uncover the gene-expression pattern, Aucott and his colleagues had to do 73 million gene sequence “reads” for each of the study subjects — 29 with Lyme disease and 13 controls. That’s the Big Data. They’ve since sampled the immune cells of 175 patients. Continue reading


Big Data Hubris? Where Google Flu Trends Went Wrong

flu graph

(Keith Winstein, MIT)

Last January, MIT computer science graduate student (and former Wall Street Journal reporter) Keith Winstein reported on the dramatic divergence between Google’s flu data and the official CDC flu numbers: Is Google Flu Trends Prescient Or Wrong?

“This could be a cautionary tale about the perils of relying on these ‘Big Data’ predictive models in situations where accuracy is important,” Winstein said in an interview with CommonHealth.

Bingo. A paper by Northeastern University researchers and others, just out in the journal Science, looks at where Google Flu Trends went wrong — and presents the errors as exactly that sort of cautionary tale. And one of the morals of the “The Parable of Google Flu: Traps in Big Data Analysis” is that Google needs to share its workings better with other research outfits. From news@Northeastern:

By incor­po­rating lagged data from the Cen­ters for Dis­ease Con­trol and Pre­ven­tion as well as making a few simple sta­tis­tical tweaks to the model, Lazer said, the GFT [Google Flu Trends] engi­neers could have sig­nif­i­cantly improved their results. But in a com­panion report also released Thursday on the Social Sci­ence Research Network—an online repos­i­tory of schol­arly research and related materials—Lazer and his col­leagues show that an updated ver­sion of GFT, which came about in response to a 2013 Nature article revealing GFT’s lim­i­ta­tions, does little better than its predecessor.

While Big Data cer­tainly holds great promise for research, Lazer said, it will only be suc­cessful if the methods and data are made—at least partially—accessible to the com­mu­nity. But that so far has not been the case with Google.

“Google wants to con­tribute to sci­ence but at the same time does not follow sci­en­tific praxis and the prin­ci­ples of repro­ducibility and data avail­ability that are cru­cial for progress,” Vespig­nani said. “In other words they want to con­tribute to sci­ence with a black box, which we cannot fully scru­ti­nize and understand.”

If sci­en­tists are to “stand on the shoul­ders of giants,” as the old adage requires for moving knowl­edge for­ward, they will need some help from the giants, Lazer said. Oth­er­wise fail­ures like that with Google Flu Trends will be ram­pant, with the poten­tial to tar­nish our under­standing of any­thing from stock market trends to the spread of disease. – See more at:

Read the full Northeastern story here.