As an IT-based GIS analyst for the last 30 years, I’ve spent a lot of time working with and thinking about data — how we collect data, how we create context by turning data into information, and then how we turn that information into knowledge by analyzing and interpreting it via GIS. In the last year or so, I’ve transitioned into a primarily data governance role in public health. In addition to being able to leave my generalist ways behind, I’ve had to get with the times and become more knowledgeable about big data and GIS, and how they impact public health.
The Three Vs
Big data are complex datasets so big that the traditional systems and software many of us used in the past are less and less relevant. Instead of databases, we talk about “data lakes” and “data lakehouses,” natural language processing instead of machine learning algorithms. Databases and machine learning are still — and will continue to be — part of the IT lexicon, but those of us who started in the early days of GIS have to think exponentially larger! There are more kinds of data (variety), arriving in increasing amounts (volumes), much faster than ever before, often in real time (velocity).
Syndromic Data is Big Data.
Documenting and mapping illness clusters has been around since at least the 1850s. In the mid-1990s, there was a push in public health to identify illness clusters before they became larger outbreaks. As the threat of bioterrorism became more prevalent in the United States and around the world, public health professionals were looking for methods to gather data in real time or near real time so that analysts and investigators would be able to anticipate events, and first responders could prepare for large scale events instead of just reacting.
Syndromic surveillance is the use of nontraditional data sources, like over-the-counter drug sales and school absenteeism rates, with more traditional data sources, like laboratory test results and physician diagnoses, to understand emerging public health events. Syndromic surveillance employs various analyses, including data projections and modeling of real-time data, to provide immediate information to the epidemiologists and analysts investigating and following up on potential outbreaks.
By the early 2000s, social media had become a significant part of human communication, and new kinds of data were being generated. Instead of just structured data in databases and other data management tools, we began to see more unstructured data in the form of text- and image-based posts, audio files, pdf documents, and other qualitative data.
At the same time, sources of remote sensing data like satellite imagery and aerial photography were starting to become more widely available and could be incorporated into syndromic surveillance to understand the impact of climate on health variables such as vector-borne disease.
What Have Maps Got to Do With It?
Michael Goodchild, Professor of Geography Emeritus at the University of California, Santa Barbara, said: “... Big data has a role to play in what we might term spatial prediction, or the prediction of where rather than when.”
- medical surveillance data from systems like ESSENCE, GeoMedStat, and COVIDcast,
- structured data from hospital emergency departments and urgent care centers,
- unstructured data from social media, sensor data (everything from satellite imagery to video camera recordings and public transit), and more,
while languages/environments like R and SAS support map development through data cleansing, geocoding, and visualization.
When you add in the ability to monitor climate variability, environmental conditions, and their impacts on the dynamics of infectious diseases, it is easy to see how GIS and related technologies can have a huge impact on predicting vector-borne diseases, like malaria and dengue fever, and why it is essential to understanding how these diseases may impact the public.
GIS gives analysts and epidemiologists the ability to create context in public health data, turning information into not just knowledge but actionable knowledge, giving public health analysts the ability to make fully-formed, real-time decisions that can significantly improve outcomes for the people they serve.