Geocoding, Data Quality and ETL
By: Hal Reid
| (Oct 26, 2005) |
This month's Location
Intelligence Magazine explores the three areas of geocoding,
data quality and ETL (extract, transfer/translate, load), which are all
inter-related and help us get to the understanding of where things are,
in terms of the enterprise systems for assets, customers and business
development.
In systems that are data-driven, there is always the perpetual quest
for clean data.We want data that is correct, remains that way; and
that we can reliably use to understand, make decisions and even feel
confident to use it as a mailing list.
But data quality and usability doesn't happen by chance.It is
interesting to look at an example of something we take for granted
today; something that just a few years ago was pretty imprecise or
almost non-existent, i.e.accurate geocoding.How about finding a route
from your house to your friends' new house anyplace in the U.S or
Western Europe? It is pretty simple today either via the Web or with
your inexpensive desktop mapping program.But to put that
infrastructure in place so it works first time, every time, was not
trivial.
Several years ago, Matt Jaro, one of the world's experts on geocoding
was telling me about some of the issues in geocoding.Things like the
use of soundex (it sounds like...) and reverse soundex (it doesn't sound
like...but really is...), non-linear address ranges and the problems of
parsing an address.For example, 123 Main Street is pretty
straight forward, so is 123 Main St., but they are not the same - St,
Street.How about Maine Street or Mane Street? Note that Main and
Maine sound the same, so does Mane.Hmmm, which one is correct and how
do know which one is right? Then there is Sherlock Holmes
address, 17B Baker St? Is the B a phonetic or part of the address?
In Japan, the addressing scheme is determined by when it was built, not
where it is on the street.So #1 Honda St is not necessarily next to,
or even across the street from #2 Honda St.A common world wide problem
is the abbreviations used in addresses; another is simply the
misspelling of the address.You can see that getting a good geocode in
not trivial.With geocoding, the problem is both with the original data
and the end user.
The quest for good, clean, accurate data not just for geocoding but all
of the other uses for clean data.The pursuit for clean data is not
that different from other great searches and all the questions they
raised.Columbus didn't have good data and really didn't find India,
but believed he did.Cortez had the same problem and without local help
would still be stuck on the beach.Bad data always initiates the search
for good data, but only when it is discovered.
The questions for geocoding, data quality and ultimately for ETL, are,
if we don't know the data is bad, is good simply relative? Does good
data remain good when transformed? And, is the quest for good data as
important as the data itself?
For the answers to these and other questions, explore this issue of Location Intelligence Magazine.
|
Your Comments Post a comment All comments provided in this section are those of the individual who has created the post. These are not the opinions of Directions Media, its editors, staff or owners unless otherwise noted. Directions Media retains the right to edit or delete any comments posted herein.
|

