Technical and Design Issues in Radius/ ArcSDE
Editor's Note: This letter references the article
"An Overview of Laser-Scan's Radius Topology 2.5" and its comments.
Thanks Sherry for the reference to the article by Scott Morehouse et.
al. I didn't know it existed and am thankful for the reference. I have
read it in order to ensure that what I might say is done from a
position that is more informed that otherwise it might have been.
There are some serious technical and design (never mind marketing)
issues being raised in the technology being discussed. These
issues, and the perspective the companies and individuals bring, like
it or not, are influencing geospatial technological trends in the short
to medium term in a way that the industry cannot avoid.
These issues cannot be adequately addressed in this forum, so I invite
anyone who wishes to discuss these with me off-line to do so at my work
email address of simon.greener(at)forestrytas.com.au. However, I will
endeavour to make some points in order to get people thinking.
I don't want to get into a comprehensive critique of the ESRI approach
as described in the paper, after all it is a paper in a journal, and
this is not the place for such a critique. But there are equally "a
number of serious problems with [the ESRI] implementation" as well.
Like the old joke about lawyers and opinions (or competing database
designs), one is tempted to say that there is no absolutely right
answer. (We should always be careful of arguments that say it is EITHER
one thing OR another, because usually it is a case of "BOTH AND"). But,
I think that some approaches are fundamentally more sound and will
produce more fruit. Can I convince anyone of that? No. But I can put
forward some ideas that might help buyers make a more informed choice.
For example, in the introduction of the Morehouse et. al. paper, the
authors admit that their approach is "unconventional". Now
unconventional means that they are putting themselves outside the
received wisdom, experience and research trends and outcomes of a very,
very large and fundamental component of all information technology -
databases. This is NOT to say that ideas from small companies can't
affect or change received wisdom and practice, but in this case there
is much evidence that points away from ESRI.
I am not aligned to either ESRI or Laser-Scan. However, my employer,
Forestry Tasmania, has BOTH ESRI (ArcSDE) and Radius technology: in
fact, we chose Radius over ArcSDE in a clear an open market.
And finally, before I begin, I point out that the approach put forward
by Radius has been accepted by almost all ESRI's competitors because
Laser-Scan's approach is far more vendor neutral that ESRI's. This
means that a whole host of technical people in Autodesk, Intergraph,
MapInfo etc have evaluated Radius far more than we did and it has not
been found wanting. Radius is NOT a competitor for ArcSDE. Radius is a
Plug-In for the Oracle database and does not demand the installation
and use of proprietary technologies for accessing data stored therein.
If anything, ArcSDE is a competitor to Oracle Spatial!
The quote Sherry included which I included in an edited form above,
also spoke about referring to the "conventional relational topology
model". One of the problems with referencing Morehouse et. al.'s paper
(in order to encourage a comparison of ESRI vs. Radius) is that one is
not comparing apples with oranges.
In the paper the authors assume as a baseline for critique the purely
"relational" model. They also assume that whole features (like
polygons) must always be created by complex (and computationally
expensive) relational joins at query time when accessing and
visualising data. This may be the case with ArcInfo's coverage data
model, wherein polygons are constructed from the Polygon Arc List (PAL)
file from the underlying ARCs etc (at access time eg POLYGONS command
in ArcPlot), but this is decidedly NOT the case with Radius. The paper
goes on to advocate the opposite approach by insisting that it is
better to compute topology when it is needed (e.g. editing or in
topological queries) but to store whole features (which is the case in
all versions of ArcSDE from its initial inception as SDBE). Cf "We have
chosen to simplify and streamline the generic explicit topology model
and to not persist both representations."
Is the choice really as binary as their EITHER/OR argument of the paper
indicates?
No.
Radius, based on Oracle's Object extensions to the base relational
theory, actually promotes a BOTH/AND argument in which greater choice
is offered to the customer. One can have the best of both worlds
because in Radius BOTH forms of data storage are supported (i.e. it
"persists both representations"). So, one serious problem in the
Morehouse et. al. paper, that of "the performance of typical queries", is
eliminated in Radius and so cannot be a point of (unfavourable)
comparison.
Radius also implements a mathematical "manifold" which provides a
mathematical space in which multiple datasets can share a set of
topological primitives. This extends topology and topological
data management from the restrictive single-dataset case (which seems
to be the implied norm in the Morehouse paper) to one in which multiple
geospatial datasets can share underlying common primitives. This
approach really delivers the Holy Grail of "vertical topology" which
was one of the things that TIGRIS attempted to do and which has always
been a foundational goal of Object Databases as applied to geospatial
data. (BTW Dr John Herring, the author of TIGRIS, now works for Oracle
Spatial as its chief systems architect so customers can rest assured
that Oracle
Spatial - and, by extension, Radius, - is based on sound relational,
topological and mathematical theory!)
For those who can't envisage what I say, perhaps you can think of this
as having an ArcInfo coverage in which one can have multiple PATs
representing quite different business objects!
This is no abstract idea that has no resonance in business. We have
Radius manifolds in place that allow us to model our fundamental
business data more accurately AND to implement more efficient edit
processes (and thus cost savings) by having some very complex business
objects (forest stands, vegetation polygons, stream side reserves,
special management zone, administration areas etc etc) share
topological primitives. We can do this INSIDE the Oracle database
in a way that is INDEPENDENT of any one GIS vendor precisely because we
use Radius. We simply cannot do this with ESRI's approach.
There are other benefits for having both formats "persist" in the
database. Real businesses find themselves in situations in which they
need to create transient or new geometries in order to address many
different needs: from efficient support for advanced spatial querying,
to complex rendering (cf "dropline" style rendering that is somewhat
akin to Dynamic Segmentation), business process implementation etc.
The Oracle Spatial and Radius approach allows us to choose when and how
to deploy critical functionality. We can use database triggers to
execute spatial queries and processing independently of the need to use
"external" processing applications (which is demanded by the ESRI
approach); we can choose to use Oracle Materialized Views based on a
set of topological primitives and joins that construct new or transient
objects and so avoid almost all the computational costs that Morehouse
et al warn us about.
Thus, Oracle Spatial with its Radius plug-in provides a much greater
semantic framework in which to model and access data. That it has
potential integrity problems (Morehouse et al notes potential problems
with primary/foreign keys and null, and also highlights "the
maintenance of semantic integrity in the model" as a problem) is true
in the abstract, but none of these has stopped either Oracle or Radius.
Relational theory does at least allow us to build self-referential
databases that are self-documenting and application independent. RDBMS
allow users to create databases that are complete messes; Radius could
have done this when they implemented topological manifolds in Oracle.
But the reality is that they didn't.
As another quick aside (see later), Morehouse et. al. make no reference
to the work involved in ISO/IEC 13249-3 "Information technology.
Database languages. SQL multimedia and application packages. Part 3:
Spatial". Given ESRI's public commitment to standards this is a serious
oversight. Also, the proposed standard is, by its very existence, a
critique of the Morehouse et. al. paper.
The main "serious" problem that Morehouse et. al. identify is that of the
"performance and complexity of typical updates". Their main problem, I
think, is that they think this is computationally expensive, slow to
perform and thus requires additional server hardware.
By extension, the question of the load that this might put on the
database server comes into play. As an IT database practitioner
for nearly 20 years I am not unaware of the "database as application
server" problem which discussion of PL/SQL as a platform for
implementing business logic creates. (This is one of those "party
stoppers" akin to asking 10 viticulturalists which clone of Pinot Noir
is best. Bring up this topic, stand back and watch the sparks fly!)
However, the plug-in technology that Radius brings to the Oracle
database is very, very stable and incredibly fast. As such this may, or
may not be a problem depending on the complexity and size of the
transactions involved. It is difficult, therefore, to be so categorical
as to say that it needs a separate server approach. My view
is that if the load becomes a problem (it is not at the moment for us),
then Moore's Law and commodity server pricing come to the rescue in
that I would prefer to purchase a new dual Xeon Windows/Linux server
for just my Radius-enabled Oracle Spatial database (costing around
AUD5,000) than purchase a second application server that would have to
be dedicated to the compute intensive work that Morehouse et al
envisage. (Bit of a no-brainer to me.)
For me, what one group sees as a problem to be avoided, others tackle
it head-on, use their intellect and come up with algorithms and
implementations that solve or mitigate the problem. Thus we have
innovation and with innovation competition and choice!
Morehouse et al also seem not to be aware of an old database adage
which is commonly expressed as: "normalise for edit, denormalise for
performance. " Normalisation means transactional cost but it means data
that is semantically correct and integral.
Denormalise for performance is what we do to fully normalised databases
(cf Codd) when we want to query them (eg Star, Snowflake etc schemas
and MD data structures in OLAP). Normalised data to Morehouse is the
topological data; Denormalised, the "whole feature" model of ArcSDE and
SDO_GEOMETRY as in O-RDBMS. Radius implements both so this issue is
almost non-existent. Where we need new or derived objects,
Materialized Views (cf above) allow us to denormalise efficiently and
effectively. Wonderful choice because we have chosen to use the full
power of the database!
However, the "update performance" issue is real, but it only affects
two small aspects of the current version of Radius.
1. Updates to the derived topology within a manifold is done
synchronously with the update of the whole object (performance issue);
The synchronous rebuild of the topology has the potential to be time
consuming (it might cause an operator to think the update has hung and
press Ctrl-C!).
I have indicated to Laser-Scan a possible approach/solution to the
synchronous nature of the updates, and solutions are being considered
for a later release. Because the current situation is implemented
in PL/SQL I could, if I wished to, rebuild the PL/SQL to implement our
own asynchronous rebuild (using DBMS_JOBs) but this would create
support difficulties with Laser-Scan. Currently the issue is not so
great as to require such radical intervention (a testimony to the speed
of the Radius software).
2. A failure to build the derived topology may cause the update of the
main, business object to fail.
However, a failure to topologically structure a business object does
not necessarily cause the whole transaction to fail. This is actually
up to the application (and so is in the domain/power of the user to
decide). What happens is that when a structuring error occurs an Oracle
exception is raised with a status indicating the severity.
Admittedly, at this point most applications would rollback but this
need not necessarily be the case.
Because Radius persists BOTH the whole object (ESRI case) and the
topology, it is possible to separate the update to the business object
(critical) from the construction of the topology (derived). This is
actually what happens at load time: the base data is loaded into the
business tables and then these tables are added to particular manifolds
(bulk structuring). If a topological problem occurs, this is recorded
in this system and can be viewed by editors (via those wonderful things
- Oracle views) in order to correct the problem. This Radius can handle
dirty data and dirty edits quite comfortably cf Morehouse et al "Dirty
Areas".
For example, during bulk structuring from one table to another, if a
feature fails, the transaction is rolled back, an error is logged in
the error table and then the feature is copied across with the radius
triggers disabled. The result is that all features get copied across,
but those that failed to structure are not part of any topological
manifold. They can be easily identified and the reason for the failure
is in the error table.
An important point is that, at all times, the topological manifold is
correctly maintained and contains the correct set of topological
primitives for those features in_the_manifold_. This is absolutely
essential from a data consistency point of view. Business
objects which fail to structure are not added
to the manifold, but are still available for other purposes, and are
easily identifiable.
Remember, in the Morehouse EITHER/OR approach this problem presents as
a HUGE problem precisely because a transaction that failed
topological validation would mean that the "real" business object would
not be available to the database (and dependent applications) until the
error is fixed. Radius does not have this problem because of its
BOTH/AND approach.
Finally, one of the fundamental underlying technological restrictions
in ArcSDE is that ALL topological comparisons from SDBE 1.0 are done by
coordinate comparisons in the middle-tier. While this does provide a
measure of consistency across database platforms (something Oracle and
Radius don't have to worry about), it means that ArcSDE places a
restriction on the client's use of, and investment in, Oracle Spatial.
So, even where ArcSDE uses SDO_GEOMETRY as the storage format within
Oracle (as against SDEBINARY), ArcSDE does NOT use SDO_RELATE when
querying the database and returning records to any of its clients. (At
least this is the case up to ArcSDE 8.3 - we have not installed ArcSDE
9.0 so I can't say directly whether anything has changed from 8.3 to
9.0: unlikely.) To change this fundamental approach would require a
huge amount of re-engineering so any design has to fit what is
technologically and financially possible. When one realises this, and the relative late
release of the Morehouse et al paper and implementation (at ArcSDE
8.3), one can't help thinking of the chicken and the egg: which one is
coming first?
Summary
Morehouse et. al. are in danger of creating a "straw man" in order to
blow it down. They assert "serious problems" without any empirical
evidence to back their case. They attack the problem with, what seems
to me, a pretty limited understanding of database and related
technologies. Their approach is restrictive due to the very "binary"
EITHER/OR paradigm logic they use to lay out their case. Their approach
perpetuates a "geocentric" view of life that simply cannot and will not
stand the test of time (see my December 2004 Soapbox).
Finally,
they offer, in both paper and product, an approach that restricts what
an organisation can do with its own data.
No thanks.
My view is that just because one group of men see problems in the
relational model, it does not mean that the solution should be to throw
away some fundamental maths and science just because we are dealing
with spatial data. The relational model is robust.
It works and it can be made to work efficiently and effectively in the
spatial field. When "object" extensibility and database features not
available in ESRI software (eg materialized views) are added in, we get
some real power. But one man's problem is another man's algorithm and
thus marketing edge!
In the end, Morehouse et al state that their approach "uses an
unconventional approach to the problem of database integrity and
transaction management". I am unconvinced that an unconventional
approach with its associated high software and support
costs (made more untenable because they are the only ones in the whole
of the IT world who seems to think that their issues are so special
that they need to go down a different path to everyone else) is
warranted.
What I would prefer to see is ESRI Inc put the fine minds of those who
work for it into working collaboratively with the database vendors to
implement SQL3/MM rather than launch out on their own. Implementation
of this standard, which includes topological data structures (cf
TP_DirectedEdge), would move us into a world where implementers of
fully spatial business systems get to choose their data models and
decide where to deploy key features rather than have vendors force
particular products down our throats. Oracle is working towards full
SQL3MM implementation: Radius is better positioned to leverage this
than its only direct competitor: ESRI.
James Larson's almost "marketing speak" listing of the virtues of
ESRI's approach hasn't really moved the debate forward. There are many
assertions in his list that I will not challenge because I agree within
some (the cross-database vendor perspective that ESRI has that Oracle
and Radius don't) or I simply have not the time to challenge because I
have already worn out my welcome with this posting. But one assertion
needs to be clarified: that is the ESRI claim that their SDEBINARY
approach offers the best performance with regard to spatial data within
Oracle. I put out the challenge for some independent third-party to
conduct a shoot out using their latest technology offerings in order to
provide some more objective figures. A sort of PC-Labs database
shootout. I would love to do it but I have a wife, family, vineyard and
fulltime employer!
Currently, on Oracle, the choice is a NO BRAINER. Outside of that, all
bets are off. But the Radius/SQL3MM approach would not be that hard to
do inside Informix IDS.
In all technology debates, one must always ensure that one is comparing
apples and oranges. We must also remember that one man's (or company's)
approach does not signify correctness. When assertions come from a
dominant player in an industry with almost monopolistic power, it is
important not to take what they or their consultants and business
partners too literally. The onus is on everyone to argue
distinterestedly for the common good.
Simon Greener
GIS Manager
Forestry Tasmania
Special Announcement

Poll
Considering your 2010 travel budget, what type of events are you most likely to participate?
Webinar SignUp
Click below to sign-up for our latest Webinar
January 01
2010 Directions Media Webinars coming soon!
January 01
2010 Directions Media Webinars coming soon!
White Paper Downloads
Letters to the Editor








