Emerging Issue: Spatial Data Quality

Data quality is a problem we need to address if we in the geospatial industry expect to be a part of the enterprise IT picture. Our most pressing need is a simple, reliable way to answer: "Are these data fit for this purpose?" each time spatial data are merged or shared in an enterprise system.

Here's the problem. In the past, users captured individual spatial data sets for specific and often independent uses. Today, the spatial data used in enterprise systems flow in from many sources. Often the origins and capabilities of the data are unknown. Making the problem tougher, data have to be integrated and used quickly while they are still relevant.

The data quality problem is multi-faceted. To understand the issues, it helps to categorize the problem into general domains. Here's a rough cut at a few potentially useful categories.

Geometric domain - topology, proximity, directionality, alignment, co-ordinates, including method of collection GPS or non-GPS. Incompatible topology models, for example, can induce significant errors as data are merged.
Data domain - geocode/addressing, vector, raster, elevations. Each data type has its own unique characteristics and capabilities.
Application domain - topography, cartography, transportation, utilities, localized eCommerce. Each application has a unique set of data quality requirements.
Data management domain - database management; extract, transform, load; data merge, search and Web service. Data quality problems may be resolved or exacerbated within each data management function. Careful design of data management workflows can minimize problems.
Temporal domain - The time element can make spatial data much more useful. With thoughtful design, spatial and temporal data together have a wide range of powerful capabilities. But, spatiotemporal data carries significant complexity making data management and access tricky.
Political/cultural domain - Often political and cultural issues are the most difficult to resolve. If the people involved in a system are unwilling or incapable of sharing data, data quality suffers. Changing this kind of problem requires a deep understanding of organizational dynamics and information behaviors.
Economic/financial domain - Capturing and managing spatial data are generally expensive and labor-intensive. There has to be some kind of economic system that allows the people doing the work to be compensated. However, current intellectual property (IP) models tend to create problems if data with different IP constraints are merged.

There are undoubtedly many other useful categorizations and examples. The point is that each domain has a set of data quality issues that need to be addressed. The logical place to address those issues is within standards bodies and industry groups. OGC is just starting to work on spatial data quality and will likely become a focal point. The Infrastructure for SPatial InfoRmation in Europe (INSPIRE) outlines specific requirements for data that are to be aggregated into central EU systems.

While the spatial data quality problem still needs a lot of work, we are seeing some progress in two areas: geocoding/addressing and vector data. The geocoding area is relatively mature because large enterprises have long needed to optimize mass marketing, billing and other mail-related functions. A number of established vendors address enterprise requirements for converting addresses to explicit locations. Examples: SAS Institute/Dataflux, Pitney Bowes/Group 1, Trillium Software, QAS, SRC and Cquay. These companies generally compete on the basis of their capabilities and processing throughput for address standardization and geocoding.

One new startup, Proxix, is addressing the need for high-precision geocoding by collecting and using parcel geometry and data for the US. Proxix also provides a capability for selecting the best data source for each geocode based on a user-defined set of rules. A long-established company, DMTI Spatial, now offers "Location Hub," a product that broadly simplifies spatial data quality and management tasks.
Companies like Proxix and DMTI Spatial are addressing needs for high-precision location intelligence and spatial data quality. These new requirements will drive additional innovation from both established vendors and startups.

With a few notable exceptions, spatial data quality in the vector domain is less mature. The basic reason is that, historically, vector data were generally gathered for a specific purpose by a user. That user managed editing and error correction until the data were fit for their particular purpose. In spite of lengthy discussions and arguments about data sharing, users didn't have much real incentive to design or manage vector data for external uses.

Today, that is starting to change. Concepts like master spatial data management and location hubs are being implemented within enterprise systems. Spatial information infrastructures like Ordnance Survey's Master Map are well established. Whether you call it master data management, location hubs, or spatial information infrastructure, use of vector data across different applications is increasing.

One company, 1Spatial (formerly Laser Scan), has been automating vector data quality management for years. Its main product, Radius Studio, offers automated, rules-based data quality management tools for high-volume processes. Radius Studio also manages the integration of vector data from multiple sources. As enterprises increase their cross-process use of vector data, we will see companies like 1Spatial gain traction and spawn a new wave of innovation.

But our industry is a long way from simple, cheap, standard and versatile data quality solutions that address the many spatial data quality problem domains. Looking forward about 12 to 18 months, expect to see enterprise users focus on spatial data quality. Companies that can address these issues with innovative solutions (rather than cleverly re-packaging existing stuff) will do well. Also, expect to see users demand standardized interfaces and services. OGC and INSPIRE have important roles to play.

To summarize, enterprise users require effective, standard, predictable data quality. Those users increasingly want to use spatial data within their information systems. This situation creates a demand for broadly effective spatial data quality management - a demand that our industry has yet to address. But, there's a pony in there somewhere. We need to find it. Soon.