Bookmarks

Data Quality in the Age of Cloud GIS and Crowdsourcing

avatar
Michael Johnson
post-picture

Not long ago, acquiring spatial datasets for a GIS project required persistence, negotiation, and often substantial cost. Because data collection and acquisition demanded time and effort, practitioners tended to scrutinize datasets carefully before incorporating them into analysis. Today, cloud-based platforms, crowdsourced databases, and real-time streaming services have dramatically reduced those barriers. Vast quantities of spatial data can now be accessed within minutes.

This abundance raises an important question: does data quality still matter? The answer is unequivocally yes. In fact, the ease with which data can now be obtained makes critical evaluation more important than ever. When information flows freely, the responsibility to assess its accuracy, lineage, and suitability increases proportionally.

Be Critical — Even of Your Own Data

Mobile technologies have democratized spatial data creation. With a smartphone, anyone can collect GPS tracks and upload them to cloud-based GIS platforms. Collaborative initiatives such as OpenStreetMap demonstrate the power of community-driven mapping. However, convenience does not eliminate error.

Consider a GPS track collected around Kendrick Reservoir in Colorado using a smartphone and later mapped in ArcGIS Online. When symbolized by time progression, the earliest track points deviated significantly from the intended path, appearing to cut across private lots and terrain. Switching to satellite imagery confirmed the impossibility of those routes.

The discrepancy resulted from limited satellite visibility at the beginning of data collection. As additional satellites, Wi-Fi signals, and cell towers became available for triangulation, positional accuracy improved. Initial errors extended as much as 600 meters from the reservoir and persisted for roughly ten minutes before stabilizing.

This example underscores a fundamental lesson: spatial accuracy, precision, collection method, timestamp, and device limitations must all be evaluated. Blind trust in automatically generated data can produce misleading interpretations.

Misleading Patterns: The Lyme Disease Example

Spatial misinterpretation is not limited to GPS tracks. Case counts for Lyme disease in Rhode Island towns between 1992 and 1998 were mapped using Esri tools for instructional workshops. The resulting choropleth map suggested fluctuating incidence patterns over time.

However, communication with the Rhode Island Department of Health revealed crucial context. During the 1980s and 1990s, the state prioritized Lyme disease surveillance and expanded outreach to healthcare providers, increasing reported cases through robust case classification. In 2004–2005, personnel changes and shifting priorities reduced active follow-up, leading to fewer reported cases.

Without this contextual knowledge, one might incorrectly conclude that disease incidence declined. In reality, reporting methodology had changed. Such information is often absent from standardized metadata, emphasizing the value of directly contacting data providers.

Understanding collection strategy, funding changes, and reporting protocols is essential. Otherwise, spatial analysis may reflect administrative shifts rather than genuine epidemiological trends.

Scale and Resolution: The “Walking on Water” Problem

Another instructive case involves a GPS track recorded along a Lake Michigan pier in Manitowoc. When mapped in a fitness application, the track appeared to traverse open water. The issue was not faulty movement but inadequate base map resolution: the pier was missing from the underlying dataset.

This scenario highlights the importance of scale and resolution. Modern GIS tools allow users to zoom to detailed scales instantly. Yet the data being viewed may have been collected at a much coarser scale. If analytical decisions are made at 1:10,000 while source data was compiled at 1:50,000, spatial precision becomes unreliable.

Mismatched scales can lead to flawed conclusions, property disputes, or worse. Resolution constraints must be acknowledged before performing high-stakes analysis.

Responsibility in the Era of Instant Publishing

Today’s GIS platforms enable rapid sharing through web maps, embedded applications, and story-driven interfaces. Once published, maps can reach thousands or millions of viewers. The visual polish of modern cartography can create an illusion of authority, even when underlying data weaknesses persist.

Therefore, practitioners must evaluate data provenance, update frequency, collection methodology, and appropriate scale before analysis and publication. Metadata remains a cornerstone of responsible GIS practice. Just as users appreciate well-documented datasets, data creators must provide transparent documentation for others.

The principle is simple: greater accessibility brings greater accountability. In an era defined by cloud GIS and crowdsourcing, vigilance regarding data quality is not optional — it is foundational.

For further discussion of public-domain data considerations and metadata practices, resources such as The GIS Guide to Public Domain Data and the Spatial Reserves blog explore these issues in depth.

Read more