BGSU research team takes on COVID with data ‘superpowers’
Unprecedented study combines about a dozen different data sources to track COVID case counts
By Bob Cunningham ’18
When Dr. Trent Buskirk was a child watching Saturday morning cartoons, his favorite show was “Super Friends.”
The show featured DC Comics superheroes such as Superman, Batman, Wonder Woman, Aquaman and their amazing friends who team up together as members of the Justice League. Buskirk liked that the Justice League worked as a team to defeat foes that were so formidable that the superheroes had to work together in order to defeat the villains.
That’s the same approach he’s embodied as he tackles an ongoing COVID-19 research project for Bowling Green State University with the help of both undergraduate and graduate students.
“I love it when problems are bigger than me so that it requires a team of people who have different areas of expertise to solve the problem,” Buskirk said. “One of my favorite things to do on Saturday mornings when I was growing up was to watch ‘Super Friends,’ and the reason why was because I loved all of the different superpowers. In a way, I am part of a real-life ‘Super Friends’ in the real world with analytics superpowers that I bring to the table and other people bring medical powers that I don't have. We need to work together to solve these important societal problems, which, to me, is the most meaningful part of the work.
“It's the collaborative effort and this synergy that we get when we're working together on a common goal. We each have different strengths that we couldn't solve the problem individually. Collectively, however, we have a shot.”
Buskirk is the Novak Family Professor of Data Science and the chair of the Applied Statistics and Operations Research Department in the Allen W. and Carol M. Schmidthorst College of Business at Bowling Green State University.
BGSU students who helped Buskirk starting in the summer include Brian Blakely, computer science major; Youzhi Yu, a doctoral candidate in data science; and Ben Gramza, a graduate candidate in statistics. David Reynolds, a graduate student in statistics from the University of Missouri, also worked on the project in the summer.
The research team had reinforcements join starting the fall semester, including Parker Kemp, a graduate student in applied statistics; Ravi Singh, a recent BGSU Master’s graduate in applied statistics; and Matt DeAmon, a business major with a concentration in analytics and intelligence.
Buskirk’s project is an unprecedented study because it combines about 12 different data sources together for tracking COVID case counts over time.
“Basically, when all of this started In April, I reached out to a few of my survey colleagues around the country and I said, ‘I'm thinking about tracking coronavirus symptoms over time and triangulating that with Twitter data and Google search data,’” he said. “’Do you think that you would like to partner together to throw in some survey data?’”
Then he contacted some colleagues at Gallup, Facebook as well as a few other resources.
“Gallop does a nightly tracking poll, and it turns out that Facebook and Carnegie Mellon University entered into a partnership a few months ago to do the same thing,” Buskirk said. “They’re trying to track symptoms using samples that are selected from Facebook, and these data are made available to select researchers around the country.”
After starting with those two survey data sources, Buskirk added other sources through non-disclosure agreements BGSU had through resources such as SafeGraph data.
Then, Buskirk and his team of student researchers incorporate that data along with data from Johns Hopkins University and Medicine and the National Transportation Center at the University of Maryland, as well as weather data and survey and behavioral data and passively collected distance data from mobile devices. They also use hospital admissions data or COVID-related outcomes and the social vulnerability index out from the federal government as well as some other community indicators at the county level better measured by the Census and the American Community survey.
“This project really captures my full perspective on data science and in surveys in particular,” Buskirk said. “We're able to use the survey data to give the ‘why’ of the matter and I'm able to use these other data sources to give me the ‘who,’ the ‘when’ and the ‘where,’ and so I'm trying to combine them all to tell a better story that has everything.
“Like with a good story, you have all of those things. It’s the same with a good analysis because you also want to have data sources for those questions being answered at some point.”
Buskirk said this project is the first he has worked on where he’s been able to get a “360-degree, comprehensive view” to answer all of the questions.
“It's partly because of the unprecedented times that we're in where many companies are willing to share data to help solve the COVID problem,” he said. “Whereas before, I might only be able to answer four of those questions with triangulating data sources.”
They also are mining Twitter and Google search data to help with their research.
“This is an unprecedented opportunity to combine all of these things,” Buskirk said. “A cool derivative of this project is the social media data component. We are collecting data from about 20 different states or cities. Another derivative that we just started to get some preliminary results on is a battery of about five dozen county-level variables we are tracking that may predict case counts of COVID on a weekly basis. What we're trying to understand is whether there are community-profile variables that might change in importance as the coronavirus continues.”
Buskirk said the reason predictions are important is because policies for COVID in the spring are very different than what they are now. “We know a lot more now than before,” he said. “We also know that risk factors can change as we set policies. Community-based variables might predict case counts over time, which really helps us get a good understanding of how we should write policies today for controlling the virus, as opposed to policies we wrote months ago.
“it's a really interesting take on population-based health science, data science and service science, and using all of these things together to help us better understand the etiology of the disease. That's what I've been doing this summer for BGSU and it's been super.”