"The only way to discover the limits of the possible is to go beyond them into the impossible." Arthur C. Clarke
Soon after Codd
wrote his paper on relational algebra in 1970, relational databases
significantly changed the way people managed data. Today, relational
databases are the workhorses of enterprise data storage. Similarly,
imagine a world without email or the Internet. What will the next
"killer app" or "killer service" look like? What kinds of attributes
and features will it provide?
In this article, we provide a primer on geospatial technology. We then
explain possible reasons for growth in the geospatial industry, examine
Ingres' geospatial project, and relate the material to learnings about
open source as a protocol for business.
The Storm is Coming
Technology change has made spatially aware applications and devices
more affordable and accessible. This is based on smaller, faster, and
more power efficient chips. Increased network bandwidth for both wired
and wireless networking has improved the availability of spatial data.
Traditionally dominated by a few large competitors, new standards and
competition have started the geographic information system (GIS)
industry's evolution towards becoming main stream. More importantly,
these standards and technologies have hastened the inclusion of spatial
awareness into applications from other industries. New opportunities
are emerging to add maps and spatial awareness to enterprise
information technology (IT) and to do so at a cost that the masses can
afford. The stage is set to provide new insights from existing data. As
countless more devices become spatially aware and interconnected, we
are going to experience an epic storm of spatial data.
There are two types of geospatial data: raster data and vector data.
Raster data are essentially pictures, although not always in the
visible spectrum. Satellite or aerial images are examples of raster
data. Vector data are a mathematical representation of real life.
Vector constructs include points, lines, polygons, and other shapes
which can be used to represent houses, roads, rivers, parks, lakes, and
The industry standards were published by the Open Geospatial Consortium (OGC)
and describe how raster, vector, and combined map data can and should
be represented. These standards include Web Coverage Service (WCS), Web Feature Service (WFS), and Web Map Service (WMS)
which describe serving raster data, vector data, and maps respectively.
OGC also defines how relational databases should store and provide
interfaces to act upon spatial data. Adhering to these standards means
systems can interoperate more easily. This enables using raster data
from one source, vector data from another, and combining them into a
map service that can be consumed by a large choice of software.
Most of us are familiar with the concept of latitude and longitude with
zero degrees longitude centered on Greenwich, England. There are other
systems that have zero degrees centered on Moscow, Paris, and other
major cities. Each of these systems is a coordinate system. Since
geospatial data may be stored in any one of a number of coordinate
systems, it is important to be able to convert between them. The Open
Source Geospatial Foundation (OSGeo) sponsored software projects Proj.4 and csmap provide this functionality. Another name for a coordinate system is a spatial reference system.
We have all been told that the closest distance between two points is a
straight line. But on the surface of a sphere, that straight line is
actually an arc. To complicate things further, most planets are not
perfect spheres but ellipsoids with imperfections. The science of geodetics deals with the measurement of the earth.
Why Use a Relational Database for Spatial Data?
There are a number of formats for storing spatial data, including
several that are just files on a disk. So, why burden oneself with the
overhead of a relational database? With one user, one set of data, and
fairly simple and unchanging demands for data, it is easy to make the
case for storing data as files on a disk. However, once you need to
share that data with a team of people, things become more complex. A
relational database management system (RDBMS) provides atomicity, consistency, isolation, and durability (ACID).
In short, this means that the database will ensure that your data is
not corrupted. An RDBMS also provides a client/server architecture that
allows shared data over a network. The security model of a RDBMS
enables roles defining who can view, modify, or delete data. These are
all important considerations when sharing data.
Ingres Ingres was one of the original
RDBMSs and was born out of the INGRES project at the University of
California, Berkeley in the 1970s. In 1980, INGRES project founders
Michael Stonebraker and Eugene Wong created Relational Technology
Incorporated (RTI) based on the technology. RTI changed names to Ingres
Corporation and was purchased by Ask Corporation in 1990. Computer
Associates acquired Ask in 1994. In 2005, Ingres was spun out of
Computer Associates with venture funding to form the current Ingres
Corporation. Today's Ingres Corporation is an open source startup based
in Redwood City, California. Ingres' revenues have recently grown to
$68M, despite the gloomy economy, making it currently the largest
independent open source RDBMS company.
Ingres competes with closed source offerings from Oracle, IBM, and Microsoft. The main open source RDBMS projects are MySQL, now owned by Sun Microsystems, and PostgreSQL.
O'Mahony and West propose there are two major types of community, those
that are grass roots initiated and those sponsored by a for-profit
firm. In the context of the Ingres community today, the latter is a
Hindsight is 20/20
It is worth noting that Ingres was one of the first RDBMSs to support
geometry datatypes. Geometry datatypes provide mathematical constructs
to describe points, lines, polygons, and other data types for
describing objects and relating them in cartesian space. Many of these
constructs are used to enable geospatial technology to relate objects
on the surface of the earth. Even though Ingres supported geometry
types, it had no support for coordinate systems, geodetics, and its
geospatial functions were sparse. As the industry defined standards for
additional data types and functions in the late 1990's, work was needed
to update the code to support them. When the Ingres Spatial Objects
Library (SOL) was originally developed, the decision was made to
outsource its development. The deal left the intellectual property (IP)
in the hands of the outsourcing company, leaving Ingres with the rights
to distribute binaries, but not the code. Recall that in those days,
geospatial technology was a tiny niche and only those with deep pockets
and an urgent need for the technology were interested.
Ingres Geospatial Project
Ingres' customer base of over 10,000 customers represents a
considerable amount of data and business. Since IT systems often
contain spatial data in the form of addresses, it is common for
customers and the community to ask what the company is doing in the
area of spatial technologies. As an open source company, it is a
significant problem to have an in-demand component not available as
open source. Out of customer and community interest and the emergence
of new standards, the Ingres geospatial project was born.
Power of Open Source
In IT Doesn't Matter,
Carr notes that large IT suppliers such as Microsoft, Oracle, and IBM
are making huge amounts of money while companies overspend on IT. Carr
also notes that there is no correlation between IT spending and
superior performance. If anything, the relationship is the inverse.
Carr asserts that IT can be done more efficiently and inexpensively as
there is no strategic advantage to paying more for platform software.
Open source software (OSS), which is distributed for free and has
development costs spread across numerous firms, seems well positioned
as a commodity and poses a significant threat to the business of the
closed source market share leaders.
The success of OSS projects such as Linux, Apache, and Firefox
demonstrate that OSS can compete and be successful. In many cases, it
can even challenge the market leaders.
To Make a Change, First Look in the Mirror
As a code base that was recently re-opened, the Ingres open source
community struggled to compete with the enormous mindshare of MySQL.
Much like the battle of VHS and Betamax, community developers did not
seem to pay much attention to details of how Ingres was technically
superior. It is fair to characterize Ingres' early days of returning to
its open source roots as "open code" but closed in other ways. While an
archive of the source code was available from the website, design
discussions, code inspections, the production code repository, product
roadmap information, and more were hidden behind the corporate
firewall. It does not make sense to be an open source company without
benefiting from an open source community.
Changes needed to make Ingres more open to community participation met
with resistance within the company. A company exists to "maximize
shareholder wealth" by making a lot of money. Open source and making
money are not at odds. However, in order to make money with open
source, you must first invest. For a startup with a sharp focus on
profitability, it is very difficult to set aside money and people to
work on something that may not generate a short term return. Despite
the odds, a decision to forge ahead was made and investment in
infrastructure such as a public code repository, bug tracking system,
public technical documentation, community mentorship, and community
management were made.
Survey of Reusable Components
It is worth explaining that much of our underlying technology, its
defining points, lines, polygons and the functions for operating on
them, is a commodity. We call this a "geometry engine" for the sake of
this article. Given the importance of community, it was important to
look first at existing communities and code reuse. Top on the list of
priorities was to contribute to making an existing code base stronger
rather than creating yet another geometry engine.
Contact was made with members of the OSGeo community who assisted in
identifying candidates for code reuse. The leading candidate was a
project called Geometry Engine Open Source (GEOS) originally developed by Refractions Research to enable PostGIS,
the geospatial plugin for PostgreSQL. GEOS had roughly 20 year's worth
of investment borne mostly by Refractions. A plan was assembled where
Ingres would adopt GEOS and contribute to the development of the code.
Helping to make this proposition more attractive, Ingres and others
lobbied other companies in the OSGeo community to join. Eventually, the
GEOS project was moved to OSGeo, and code contributors came forward
from a number of companies. Each of the organizations involved benefits
from giving a little to the development of GEOS and receives much more
back in return.
OSS provides many benefits including sharing costs, risks, and ideas.
OSS enables swift development, open communications, and collaboration.
With closed source, just negotiating the legal agreements between the
multiple companies involved can take many months. With open source, new
companies and people can join the project without having to renegotiate
contracts, thus reducing transaction costs.
The geospatial industry is poised for tremendous growth as location
aware applications and devices grow in popularity. Enterprise IT will
discover new value and insights through spatial analysis of existing
data. Open source can reduce the transaction costs of technology
partnerships. Businesses should seek out partners with interests that
align through mutual investment and reuse of OSS. Doing so allows them
to re-allocate spending to areas that provide unique value.