SCOTT SIMON, HOST:
The flu hits with the regularity of seasons. But it's hard to know when it will peak, and how it might spread. The Centers for Disease Control and Prevention want to forecast yearly bouts of influenza the way that meteorologists predict storms. So they've set up a contest every two weeks through March, in which competing teams submit predictions for when the flu will peak and how intense it will be.
Mark Dredze is a professor of computer science at the Johns Hopkins University. He analyzes social media data, to try to learn about health trends. Professor Dredze, thanks so much for being with us.
MARK DREDZE: I'm happy to be here.
SIMON: So how do you use social media to make a prediction on a medical matter?
DREDZE: We look at Twitter, specifically. And it turns out that when people are sick or not feeling well, they often tweet about how they're feeling. And if you look at millions of tweets like this, you can actually find patterns and use that figure out on a national level who is sick and who's not. And when I mean who, I mean what states or what cities have elevated rates of flu.
SIMON: And this information could potentially be very useful to hospitals, doctors, public health people - right?
DREDZE: Sure. So a hospital, for example, during flu season wants to know how many people are going to walk in the door with the flu. If we have a bad outbreak, they're going to need to make sure there's extra beds available, call in extra doctors and nurses. So even in - just a couple-of-day lead time is very helpful.
SIMON: So what about this contest?
DREDZE: What the CDC is trying to do is not just figure out what's going on right now with the flu, but what's going to happen in a week or two weeks out. Traditionally, we do what's called surveillance, which is what's going on right now in the United States. And that's a really hard problem because if you think about it, without using something like Twitter, you have to ask people what's going on. And the CDC does this; they do a very good job. But it takes them about two weeks to get those responses collected. What we're trying to do is predict things like what week is going to be the worst for the flu season; when is it going to end?
SIMON: Do you run the risk of putting through the algorithms a lot of tweets from people who don't use, let's say, a word like fever in the same way people with flu would use it?
DREDZE: It's actually very tricky to use Twitter in order to get a flu rate. So, for example, someone might say, I have Bieber fever, which is not actually a disease that we care about tracking.
DREDZE: And so we use a series of sophisticated algorithms, and these algorithms can actually tell the difference between someone saying something like, I am afraid that I'm going to get the flu versus I'm afraid that I have the flu.
SIMON: 'Cause there are people who this time of year will sneeze and say, oh my gosh, I hope I don't have the flu.
DREDZE: Not only that but no doubt when people hear this story, they're going to go to Twitter and tweet about this story, and they're going to be talking about the flu. And our algorithms are smart enough to tell the difference between people talking about the flu versus people who are actually saying that they're infected with the flu.
SIMON: Mark Dredze is a professor of computer science at the Johns Hopkins University. Thanks so much for being with us.
DREDZE: It's been my pleasure. Transcript provided by NPR, Copyright NPR.