Spatial Information as a Service

It is hard to find an industry that is not benefiting from spatial IT technology contributions.In my view, the mere emergence of this on-line magazine [Location Intelligence] and annual conference illustrates the impact of spatial IT, which is significant to business intelligence, and spans every industry.

The effect of spatial information handling is clearly illustrated on the Directions Magazine website under the "My Industry" page, where you see an array of focused selections for product information, companies, events and other resources for specific industries.This is also evidenced by the sheer number of links and success stories that the "Mapping" and "GIS" disciplines have brought to the forefront for millions of implementers and IT stewards around the globe and across industries.For all of these examples, information management is the foundation of solid location intelligence for effective business processes.

When asked by the editors to provide an update and review of IBM's spatial database technology strategy, I found it difficult to just say, "sure, no problem." Why? Because, IBM serves these burgeoning industry applications and industries themselves, with solutions for spatial information handling that are more than relational database technologies.We do this through a comprehensive middleware portfolio and approach to creating business solutions for enterprises in tandem with business partners' technologies, as well as through focused acquisitions that expand and enhance the portfolio.IBM is more about delivering geospatial information as a service than GIS-specific analytics.

Figure 1 - IBM evolving information on demand structure.(Click for larger image)

Of course, IBM's database management offerings (primarily DB2 and Informix) are a key part of this open, standards-based, interoperable middleware, to deliver these information services.

Let's get started with some details to update you on what IBM has for spatial information management, with a focus on high-priority capabilities necessary for supporting business processes where location matters: specifically, spatial standards, services-oriented architecture, spatial extract, transform, load (ETL), XML processing and management of documents, and delivering spatial information on demand for volatile, demanding business processes.

Information as a Service
Spatiotemporal information provides value to individual business processes in many industries, and IBM is helping customers generate new value from their business information with extensible database technology that allows them to integrate analyze, and efficiently manage information assets throughout their life cycle.With strategic technology investments (especially in information integration) and recent acquisitions, IBM has developed a complete information infrastructure for delivering information on as a service.This infrastructure will help businesses truly leverage their business information for strategic and competitive advantage.

When customers create information services, they are able to operate as an on-demand business - one whose business processes (integrated end-to-end across the company and with key partners, suppliers, and customers) can respond with flexibility and speed to customer or constituent demand, market opportunity, or uncontrollable external forces.

The primary goals of the on-demand strategy for spatially aware infrastructure are:

Increasing business flexibility
Leveraging information insights
Optimizing the IT infrastructure
Enhancing business resiliency and security

Figure 2 - IBM Information as a Service.(Click for larger image)

Spatiotemporal Business Value and Infrastructure Value Model
To assess this notion of information as a service and its importance to your functional needs, consider the value of spatial (and spatiotemporal) computing in the context of multiple stages for the enhancement of business processes.In its simplest form, spatial data allows locational intelligence, by being able to put a "dot" on the map revealing explicit and implicit values related to some event or to a feature.This is the very first level of the way in which IBM is delivering spatial computing to implement the infrastructure needed for enterprises and discovery of the value it supplies across industries.

Traveling up the six phases of a maturity model, we can determine the value of and the most appropriate tier of computing to implement spatial analytics to provide value to business processes for any given application of information technology.

Figure 3 - Phases of GIS integrated enterprise solution maturity.(Click for larger image)

IBM Information Management
IBM software is structured in five categories for information technology, serving industries worldwide.The three categories where we offer spatiotemporal functionality lie in our Information Management, Rational and WebSphere portfolios.So, in addition to the database technology, which is the focus of this article, IBM offers spatial analytics access from the development and tooling environments of WebSphere and Rational, respectively, as well as location-based services implementation in WebSphere, implementing the OpenLS standard, for extending the information on demand infrastructure to mobile devices (WebSphere Everyplace Access).

DB2 and Informix products implement the same set of types and functions, defined by the current GIS industry specifications, now adapted by ISO, published by the Open Geospatial Consortium (OGC).So far, this Simple Features Specification is the only one that extends SQL to accommodate spatial, and is in wide use to afford an infrastructure for spatial data sharing, access and analytics.It concisely defines geometry types, hierarchies and spatial functions. DB2 Spatial Extender and Informix Spatial DataBlade implement these and were made possible by the extensible architectures of our database technology, and additionally, gain value and acceptance by being tailored to ESRI's ArcSDE application server - the gateway middleware on which database access for all their client products is based.

In addition to the types and functions for spatial query, the creation of an index scheme, to afford efficient navigation of tables and attributes, is critical to providing a foundation for creating the software tiers for practical solution deployment.DB2 implements the grid index, and Informix, the R-tree index scheme for creating efficient access methods.This extensibility of the servers allows for the application of spatial logic in stored procedures and is initiated by triggers, so that spatial data in the database can be shared among any business logic and application demands across organizations.With or without a full function, stand-alone GIS software engine, businesses can achieve extensive spatial functionality meeting many business needs (applying spatial business logic in the server and support spatial queries).It was an important advancement to implement spatial in the server, as these spatial features and methods, indexes, triggers and stored procedures are given the same performance, security and maintainability necessary for enterprise-wide solutions that are not spatially aware from this extensible database foundation.

The Spatial Extender for DB2 and the Spatial DataBlade for Informix add spatial types, functions and indexing methods to IBM's databases.The Extender and DataBlade are plug-in components built on the extensibility features of these object-relational DBMSs and both come with the server as standard features.This fundamental approach leads to robust, high-performance and easy-to-use spatial capabilities that conform to OGC's Simple Features specification.(Source: IBM Spatial Data Management: A Solid Foundation for ESRI's ArcGIS Solutions, IBM WebCast, Fall 2005).As an example, combining the implementation and architecture of an ArcGIS software stack with ArcSDE and an IBM database provides key advantages over competing implementations. (Additional information: Spatial Integration Adapter/DB2. )

Stuck in flat land? Unique round-earth computing
Flatland, by Edwin A.Abbott, satirizes a world where everything exists in two-dimensional space, without even the concept of a reverse projection from flat to multi-dimensions.This is not unlike most of the GIS industry: information is stored and accessed in a two-dimensional world afforded by map projection transformations from one to another.Spatial information management in many industries can benefit from working with features without transformation prior to visualization.Databases have edges, or worse, storage schemes that produce multiple features, where there are really only single features, due to the map projection, leaving the applications to 'work around' the issue.

The IBM database servers have been extended for spherical computing, though not a 3-D system, but rather spherical, DB2 Geodetic Extender supports treatment of the Earth like a globe rather than a flat map, making it easier to develop applications for business intelligence and e-government that require geographic location analysis.Most location information is collected using systems such as GPS and represented in latitude/longitude coordinates.Enterprise applications work better when the data is kept in this unprojected form, leaving map projections (earth to flat map) where they belong: in the presentation layer service to display, print and disseminate maps.

DB2 Spatial ETL
It's easy to load existing geographic data into DB2 and Informix, from an exchange file or ubiquitous Shape files, for example, through IBM-supplied utilities or third-party ones such as ESRI's ArcSDE or Safe Software's FME.In general, loading data into a database is not as safe as you might think.Just as you cannot assume that an email attachment or Office document is free from viruses just because it is properly formatted, you cannot assume that geometric shapes are valid just because they come from a properly constructed geographic data file.Such files can be produced by many different software packages, and nothing guarantees that spikes, duplicate vertices, self-intersections and other problems are absent, since they are, more often than not, non-topological structures.The OpenGIS Simple Features specification defines what is and is not allowed in a valid polygon or multi-polygon.But the problem is more complex than that.The spatial reference system (SRS) into which the data are loaded determines coordinate resolution, and resolution influences geometric validity. For example, what would be seen as distinct locations in a high-resolution SRS may collapse to a single point in a lower-resolution one, leading to duplicate-vertex errors.Invalid geometries, once they've made their way into a spatial database, can cause serious problems down the road, leading to aborted queries, inaccurate results, or in an enterprise implementation, failures in completing Web services processes, and even system crashes.

This is why both the DB2 Spatial Extender and the Informix Spatial DataBlade validate every geometry value on input always, not as an option.This is important in establishing a data warehouse or data mart for multiple users.While this imposes a slight overhead on bulk loads and insert/update transactions, there is no getting around the need for it, and it is better to pay the price once, on input, than every time the geometry is accessed.Other spatial databases have validation routines, but they must be applied explicitly.Forget once, and you cannot guarantee the integrity of your database.With an IBM spatial database, you know that every geometry value in the warehouse will always be processed without problems.

Spatial ETL technology focuses on moving location data between [GIS] spatial databases such as DB2, and other applications (both spatial and non-spatial) that can benefit from using spatial data.The value of spatial data is now being discovered in the mass-market through Internet mapping tools such as Google Earth.Spatial ETL is the key to unlocking vast spatial data holdings, and making it available to the community at large.To be successful, a spatial ETL system must have the following capabilities:

RDBMS level integrity checks for legal geometry;
Geo-processing, for transforming the data into the desired structure of the destination system;
Multiple input system support, in order to extract data from one or more sources during an ETL operation;
Multiple output system support, in order to load data into one or more destinations during an ETL operation; and
Direct read, in order to provide an API that enables data to be read directly.(The API would still enable the client API to also take full advantage of the transformation capabilities of the spatial ETL tool.)

In short, the goal of Spatial ETL is to not only make data available in any format but also to present that data in the right structure.This is particularly important when moving data into a spatial database such as DB2, since the target database will almost certainly have a predefined schema that the spatial data needs to be transformed into in order to be useful for the applications that use the database.Spatial ETL complements both spatial data creation tools (GIS), and Spatial Databases (DB2), by providing the data conduit between data creation, data storage and end user applications.

Our partner for spatial ETL, Safe Software, has been following the merging of the GIS and MIS worlds and now sees it happening at an accelerated pace.First it was the databases, now it is the tools that move data between databases.Partnerships between ETL companies (such as those features in WebSphere Customer Center (formerly Ascential) and Spatial ETL companies (such as Safe Software) promise to make spatial data available to the enterprise as never before.Organizations with this "total ETL solution" will have a competitive advantage as they will be able to exploit, visualize and make decisions with the ever-growing body of spatial data warehouses and marts available to enterprises.

Increased availability of non-relational data structured information and processes - DB2 XML implementation
Our focus for the upcoming DB2 release (code named Viper) is primarily performance and efficiency for XML data.This provides a unique advancement in XML data handling - strides beyond our current extender to afford mapping XML and relational data and the competition.Spatial data is best stored and processed for applications using a database to better enable:

Managing large amounts of data (that is what a database is for);
Persistence and recovery "" for example, transaction management;
Integrate with enterprise data warehouse and data marts; and
Support services oriented architecture for wider spatial-temporal based processes/applications.

IBM is introducing technology that combines native XML storage with existing world-class relational storage, enabling seamless query of that data through both XQuery and SQL.The fully native XML implementation enables companies to build business-critical, XML-centric applications setting the bar for flexibility, performance, scalability and availability for XML document storage and access.These native XML storage capabilities provided by IBM give customers ultimate flexibility for managing and accessing business information, while offering notable performance and scalability only thought possible from a relational database.In addition, the management tools and security that apply to DB2 databases will also apply to other native data, ensuring smarter, faster business practices.

The non-native ways of handling XML are deficient, as they stand on their own.Shredding XML means you lose the fidelity, or the hierarchy, of the XML document itself.Spatial information is increasingly "less structured and non-topological." We are providing the functionality for platform and vendor independence, easy to deploy and available tooling to work with data, and suitability for structured and unstructured information/data.Applications can use relational or XML data interchangeably, with XQuery, SQL/X, or both.

The advantages of native XML include:

No XML parsing
No mapping to different data models
Flexible schemas
No translation of XQuery to SQL queries
No reconstruction of documents
Hierarchical storage of XML indexes, as native XML data type (not Varchar, not CLOB, and not object-relational)
Indexing on columns or contents (elements or attributes)
New methods for joins and query evaluations

For a deeper understanding of native XML data management, see the IBM XML Web cast (simple registration required).

IBM Spatial
IBM and its partner world are serving the broadening demand for spatial information for a location aware IT enterprise through multiple components of a software portfolio.The implementation of the SQL extensions as standards and providing an unmatched XML data management functionality, applications based on IBM software can move more quickly to providing information demanded by business processes in a more efficient manner.

Credits to sources for this article in IBM
Content for this article drawn from multiple IBM sources: Previous articles by the author, DB2 Magazine, IBM Web casts, and an Illimunata IBM web posting.Thanks also to my colleagues, Robert Uleman for his guidance and instruction on the details of spatial RDBMS technology, Chris Couper, IBM Distinguished Engineer, and Marcel LaBlanc, Safe Software.