MetaCarta Implements Machine Learning, Connects with Document Management

By Adena Schutzberg


This week MetaCarta introduced version 3.0 of its geographic intelligence products.Recall that MetaCarta offers tools to find and make sense of geographic text references in documents.Once that's done, the documents can be located on a map or searched spatially, among other things.

I spoke with Bob Warren, the Vice President of Products, to understand what's new in the release.He identified two "big things": machine learning and a connector framework.Machine learning sounds scary but its actually very familiar to anyone in geospatial circles whose used "training" in a raster analysis or feature extraction package.In those software offerings the user identifies examples of the areas in which they are interested.So, for feature extraction, one might manually identify a series of houses on a raster image, and tell the software, "find more like these!" The software deduces form the examples what "makes a house a house." It essentially teases out "rules" for determining "what is a house."

Machine learning algorithms and models have been around for a while and have been widely used in image processing and speech recognition.(See for example a detailed discussion here.) What MetaCarta's staff has done is apply it to language.That sounds hard, but if you recall that many of the staff members are natural language experts, it sounds more possible.In fact, Warren concedes, one the goal had been defined, it took a good six months to make it a reality.

Before I mislead readers into thinking this is a fully automated process, it's not.Just as "training" for remote sensing requires iterations and tweaking by the user, so too does MetaCarta's tools.That Warren refers to as "tuning." Most "trained" software gets "better" at doing what you are asking of it with more examples.The same is true in the linguistic world.

What does this mean for end-users? Basically, it gives them more confidence in the searches they do against their data sets.Search engine weenies actually measure the confidence in results with what's called an "F measure." Warren had to explain this statistic.It basically takes into account both the precision of search (the relevance to the searcher of the items that are retrieved) and its recall (the proportion of relevant information that is retrieved by a search).The number varies from 0 to 1 with 1.0 being, well, perfect.

What does this mean for MetaCarta? Quite a bit.First off, it can help in internationalization.Providing examples in say, French, for French documents in the oil industry provides the foundation for internationalization.To date, MetaCarta's offerings are English only.Second, enhanced machine learning allows MetaCarta to easily move its offering from one industry (which has its own language and structures for geospatial information) to another (which has a different one).

The second "big thing" in the new release is the connector framework.It's just what you'd think: a developer kit to plug MetaCarta's tools into your favorite document management system.As Warren put it, "the framework provides all the tools that would be the same from connector to connector" so that the developer really need only deal with the API of the application to get the job done.MetaCarta used the framework to create it first commercial connector for EMC's Documentum, the mostly widely used document management system.Warren noted that it's the first of several planned connectors.He sees many benefits in the relationship with Documentum, the most important of which are a shared customer base, who asked for such a link between the products and the opportunity to explore new areas of business with such a large player.MetaCarta is still a small company, he notes.Documentum, I suspect, sees the relationship as yet another way to differentiate it offerings.

I think of MetaCarta as offering "the rest of the story" when it comes to basic geospatial requirements.If Google or ESRI or MapInfo offer the tools to "put geospatial data on the map," it's MetaCarta that can "turn a pile of unstructured text in documents into geospatial data." I asked Warren how the recent surge in interest in Local Search has affected MetaCarta.He sees both pros and cons.The pros, he notes, include a better understanding by businesses of the value of geographic search and the raising of the bar in terms of user interfaces, especially by the likes of Google.On the down side, he notes confusion between what MetaCarta does and what Google et.al.do with geospatial data.The local search companies, Warren is quick to point out simply use "fielded" or "structured" data to geocode addresses.MetaCarta goes one step further, doing the same with unstructured data.

It seems to me that MetaCarta's move to play with the document management side of information technology will be a huge step forward in increasing its mindshare and marketshare far afield from GIS.There are indications that is already happening.At one point in our interview Warren had to stop himself, remembering, "oh yeah, you are a GIS person" which suggested to me he's been speaking with far more people who are not, in recent days.


Published Tuesday, October 25th, 2005

Written by Adena Schutzberg



If you liked this article subscribe to our newsletter...stay informed on the latest geospatial technology

© 2016 Directions Media. All Rights Reserved.