Special Announcement
Poll
Considering your 2010 travel budget, what type of events are you most likely to participate?
In-person events
Online events
Webinar SignUp
Click below to sign-up for our latest Webinar

January 01
2010 Directions Media Webinars coming soon!
Directions Magazine, Web-based Mapping, Business GIS, GeoSpatial Consulting, Location Based Services
White Paper Downloads
Get the latest white papers from our sponsors
Directions Magazine, Web-based Mapping, Business GIS, GeoSpatial Consulting, Location Based Services
Letters to the Editor
More on Relational Theory, Topology

I'd like to respond to Scott Morehouse's most recent comment and his paper referenced in other comments.

We agree on some things. In particular, the argument about recompiling business features from basic topological primitives at query time is closed.

I didn't think that the arguments put forward in the paper with respect to the weakness of relational theory as applied to spatial transactions were convincing and said so. I think that they can be made to work and used Radius as an example that it can, and is, being done. That said, let the readers be under no illusion that there are some issues (again we agree) but we disagree on the solution.

I argue for full spatial topological to be added to O-RDBMS via declarative DDL and DML via implementation of SQL 3 MM and ISO TC 211 19107 (2003) standards.

The current SQL3 standard provides a rudimentary framework for topological referential integrity but it needs more work. Until we have a fuller definition and implementation we will be forced to implement non-standard approaches rather than model-centric approaches. For example, currently I can only implement topological referential integrity via triggers inside the database, whereas the real future is in clear definition within a self-descriptive model. So, I would rather see something like:

ALTER TABLE "Stop_Valve" ADD CONSTRAINT "Valve2Pipe" FOREIGN KEY "shape" TOPOLOGICALLY REFERENCES pipe(shape) BY MEET;

This is a declarative approach to topological referential integrity that is consistent with database theory.

Related to topological referential integrity (and the central issue at hand) is the need for topological data structures and methods to be supported within the database (which Radius has done). Because of this I also referenced the TP_DirectedEdge type within the topology class model of ISO TC 211 19107 (2003) standard as an example of a model-centric (and declarative) approach to topology to demonstrate that others agree with my thinking. Thus I appeal to the main industry players, of which ESRI is one, to work towards implementing what the customer (well, at least this one) wants.

We disagree about data duplication. Scott asks: "What application functions require access to the topological primitives?" Well, here are three real-life examples:

1. Labelling overlapping polygons
A particular business entity (one table) contains observations of phenomena defined by polygons. These phenomena overlap. Mapping them makes interpretation difficult because they must be mapped together. The end users want all "apparent" overlapping areas labelled with particular information from all observations affecting those areas. We initially achieved this via some sophisticated PL/SQL code and triggers (several hundred lines), but Radius has reduced this to a 10-line function and a simple materialized view! Here a business requirement is addressed by using the topological primitives to extend the data model held within the database. There is no need for application involvement as this is a data representational issue that is a function of the database.

2. Editing shared geometry
Many of our forest data are spatially very complex and share large amounts of linework. Changing the boundary of a forest stand can mean multiple edits: boundary features (better GPS data); the reserve itself; special management zones; the forest stand; etc. This can take days. With Radius, if we want to, we can go into the data and simply reshape the shared topological line without the need to extract the data and recompute the whole mathematical topological space within the edit application. The change is pushed from the "belt" to the "suspenders" ie from the topological primitive to the affected business entities' spatial descriptions so fast that the server load is barely worth reporting.

3. Rendering shared boundaries
We have a business requirement to draw the boundary between state forest and private land using an offset symbol based on line direction. This is trivial using the topology data maintained by Radius Topology, but very, very difficult with whole-object models like Oracle Spatial Sdo_Geometry, ArcSDE or Shapefile-based polygon data. The automatically maintained topological data contains, explicitly, the sections of linework that represent the shared boundary between the polygons, as well as the information about the polygons on either side of that boundary. Rendering of the state forest boundary now can occur from both the polygonal (shading) and the shared edges (line styling) to achieve the required outcome without the need for special computation within an external application.

Many other examples exist but I do not include them here.

I responded to the paper's determination that "we [ESRI] could support these features without the additional overhead of maintaining topology primitives as database rows" by simply asking the question: why is ESRI making this decision about data that belongs to us?

While I might have over-stepped the mark by characterising this approach as being "patriarchal," I do not withdraw from the view that it should be up to the customer to decide what data model to implement and not the vendor.

The assertion that ArcGIS's implementation is closed and non-standard was not in relation the Simple Features Specification compliant vector data ArcSDE stores within any dbms. I do realise and accept that ESRI has achieved a compliance certificate in relation to this data.

My comment related to the application approach taken: I was thinking in terms of database theory. Let me clarify:

• Computing science delivers good theoretical frameworks for data management. Those frameworks are based on conceptual models-hierarchical, network, relational, object-relational and object-oriented. The model that is best understood and has the best foundation in mathematics is the relational and the best products in use today are the object-relational ones.
• Fundamental to database theory is the idea, expressed as a dictum: "logical separation of data from its physical implementation"; this "design pattern" also applies to applications and databases. Much of our existing data management capability within GIS software operates according to a 1960s model for data storage: ESRI's own coverages, shapefiles (the only documented format that I am aware ESRI has made available) and the internal structure of the LONG RAW ArcSDE format are examples of this. (So that this is not seen as an unfair criticism, I note that all vendors transgress in this area.) We simply must move beyond this if our industry to become mainstream and grow.
• Database theory requires that database data models be "self descriptive": they should NOT require an application to impart understanding about the underlying data. Thus a database "catalog" is a core element of a dbms. This is why I do not like the ArcSDE "Topology in the middle tier" approach: it is "external to the model". That is, any organisation that implements ArcSDE topology has to have an application external to the database (ie ArcSDE) in order to interpret the data and its relationships. This breaks the theoretical underpinnings of databases.
• Applications should be able to access the data via logical "declarative" languages that implement the semantics of the model through syntactically simple access languages like SQL. These access languages are broadly divided by the core areas of Definition (DDL) and Manipulation (DML).
• This has been realised in database management products ubiquitous in general IT. Their popularity testifies to the quality of the underlying science and its effectiveness in generic data management. Proprietary extensions to SQL give vendors particular marketing/technological edges, but these can be easily-enough isolated in application-building by sticking to agreed SQL standards (eg SQL92, SQL3).
• Early implementations of relational (and hierarchical/network) theory provided no data type support beyond the minimum: numbers, strings and dates. Abstract Data Types--and their math-- are a way of describing non-traditional data types. ADTs applied to our chosen profession means spatial data types and methods.

Most mainstream O-RDBMS vendors can handle spatial data types. Microsoft SQL*Server can't, but many await the XMLcapable Yukon (aka SQL Server 2005), and many hope that MS's work with Terraserver etc results in full spatial data type support inside the database via DDL and DML.

We can now create self-describing spatially-enabled data models in the database tier, with the DDL/DML to access the data regardless of application. Spatial data management doesn't mean dropping the traditional data management approach outlined above.

I don't wholly agree that "the best model for interoperability and information sharing is one of simple features (features with clean geometry) rather than a much more complex model of interrelated topological primitives". I agree that customers don't need to know "what does it mean to delete a shared edge", but only because this should be part of the database vendors' implementation of declarative topological referential integrity.

Nor do I agree that interoperability should be facilitated via vendor specific APIs. In this arena I believe we have not yet addressed the first and primary real need which is that developers need ODBC/JDBC/OLEDB/ADO etc drivers that provide a logical, standardised, spatial SQL interface to databases. Unless the house is built on rock, in vain do the laborers labor.

I do think that robust topological computational power derived from a declared model is vital to any GIS edit software: again we agree. Just as an attribute editing application should derive its data validation rules from the database (i.e., the "catalog") but implement them in the client, so topological and spatial edit functionality within client software like ArcEditor should draw all its edit rules from the shared catalog. What is valid for the attribute world is true for GIS data. Even so, none of this should preclude the storage of topological primitives in the database if the model requires it.

So, I do think we need a "belt and suspenders" approach. All GIS vendors should drop their reticence about this and throw their weight behind having it implemented across database vendors. Once the focus changes from the "application" to the database, customers should then have access to software services that can be deployed in either the database or application tier as they see fit. If I put this view rather forcefully it is because I have waited for nearly 20 years for geospatial data to be a true data type within commercial O-RDBMS; I don't want to wait another 20 till the final pieces in the puzzle are put in place.

Scott finds it strange that I consider Oracle Spatial and Radius Topology "open" and "non-proprietary" but ArcGIS "closed" and "proprietary", by arguing that Oracle is as proprietary as anything from ESRI. I disagree that Oracle Spatial is proprietary (in the narrow pejorative sense) for the following reasons:

1. Sdo_Geometry, though a "proprietary" data type, is an version of ST_Geometry that is is implemented using predominantly SQL3 MM data types like arrays (called varray in Oracle), User Defined Types (UDTs i.e., ADTs), procedures, functions, operators and type constructors.
2. The internal structure of the vector data contained in Sdo_Geometry conforms to the Simple Features Specification for Spatial data.
3. Sdo_Geometry is fully accessible to any Oracle client: from internal PL/SQL triggers, packages and functions, to any client from SQL-plus through any application.
4. It really is irrelevant what underlying physical structures vendors use to implement data types within databases as long as the data model is declarative and logical access to the data is provided via applications. As outlined above, database theory insists on this. The argument that Oracle Spatial data structures are proprietary equally applies to NUMBERS, DATES, STRINGS etc. Does anyone in general IT make such an argument against storing NUMBERS etc inside Oracle (or any database)? From an organisation's point of view, it is all just data.Oracle at 10g has started to make available the SQL 3 MM data types so that I can now do this:
create table as Road (
featureid number(10),
centerline ST_LineString,
casement ST_Polygon);
(In case readers don't know, the SQL MM types at 10g are: ST_CIRCULARSTRING St_CompoundCurve St_Curve St_CurvePolygon St_GeomCollection St_Geometry St_LineString St_MultiCurve St_MultiLineString St_MultiPoint St_MultiPolygon St_MultiSurface St_Point St_Polygon St_Surface).

Finally, to return to ESRI's LONGRAW SDEBINARY structures inside Oracle. Can anyone access this from any non-ESRI client? This approach breaks the fundamental rule of database management that interpretation of data inside a database should not require an external application. In this case ONLY ESRI clients and middle-tier technology can interpret this data.

Oracle's approach to versioning via their Workspace Manager database services (which Intergraph use in GeoMedia for their versioning) might be more pertinently described as being "proprietary", but to my knowledge there is no standard on what versioning should entail in terms of DDL and DML. Regardless, the services are agnostic: they are available to all applications via SQL as is required by database theory. Again, my view is that even here, GIS vendors should be working with database vendors on a database and data model centric approach and not enforcing the need for data model logic to exist inside their middle-tier applications.

Oracle's approach (as other database vendors) is always to try and implement features, like spatial and versioning, which are consonant with database theory and practice and which conform to all required and relevant standards.

My view is that Laser-Scan also takes the same approach. It has made a successful transition from the application-centric computing approach of traditional GIS to the more open world of database-centric, generic data management with Radius. Its approach is in major agreement with those standards that describe topological primitives. This openness is why I think Radius is superior to ArcSDE's Topology. Although Scott and I will disagree about this, I hope the readers can see that my views are well-founded in science and experience.

I think that Scott and I have given all Directions Magazine readers something to think about. It is now up to them to decide where they think the best solution lies.

In the end, time will be the ultimate judge of my views. i.e. by 2008 I reckon some "Large GIS Consultant(s)" will owe me a beer.

I would also like to close by expressing regret that my previous postings had elements of intemperateness about them that might lead people to think that I am profoundly anti-ESRI in my world view. This could not be further from the truth. I have nearly 20 years experience with ESRI software and have high regard for its capabilities in the area of geoprocessing. And I still actively use ESRI products as they have a valuable role to play in enterprise computing.

Please note that these opinions are mine and not those of my employer.

Simon Greener

PS: It would seem that there is hope in the ESRI tunnel as I noted this from a Directions Magazine news feed:

"There were some signs that the ice may be breaking between Oracle and ESRI. Some conciliatory tones were being struck by key executives. David Maguire mentioned that new links to Oracle were being implemented and he also remarked that there are certain business cases for when a 'database centric' approach may be a suitable option rather than an N-tier approach, strongly pushed by ESRI, for certain application implementations."

Database-centric approach: QED.

Advertisers