MetaCarta, Inc.-  Geographical Text Searching

By Joe Francica

From time to time, I come across companies that employ geospatial technology in a way that helps to foster further growth of applications, perhaps in a way that we may never have thought possible or plausible.MetaCarta, a company based in Boston, uses a text search algorithm that retrieves content based on geographic keywords.Its flagship product, the Geographic Text Search (GTS), has the ability to confine searches by geography and retrieve information that it detects using the keywords, and then displays this information geographically on a map interface.In January, MetaCarta picked up $6.5 Million in second round funding from Sevin Rosen Funds, one of the premier venture capital firms for the IT industry.

The company was founded about three years ago and started with the support of a few government agencies, such as the Defense Advanced Research Projects Agency (DARPA) and from In-Q-Tel, a private fund established by the Central Intelligence Agency.The original mission of the company was to create a link between content management systems (CMS) and GIS."What's really interesting is that when you look at the IT landscape is how much of it doesn't talk to each other because when people think about text documents they don't think of maps, but the stuff they are talking about in the text can be interpreted, by a human, as being geographic," says John Frank, founder of MetaCarta."So we wanted to fix this disconnection point by building technology that would interpret text, parsed through a textual document, automatically define references that the software could translate into actual latitude and longitude coordinates.We call that 'geoparsing.'"

Geoparsing, as defined by MetaCarta, is comprised of three, core technologies: natural language parsing, GIS, and system protocols such as Simple Object Access Protocol (SOAP) and Extensible Markup Language (XML) that allow interoperability between each of these systems.One of MetaCarta's strengths is the ability to combine any CMS with any GIS.The result of any text word search for a geographical reference is to show the location of that reference on a map interface.Icons on a map represent the geographic location of a text reference.A user can click on that icon and retrieve the specific document.

One of the applications in which MetaCarta has found a niche is in the oil patch.Having spent a number of years in that business myself, I can tell you that the ability to search documents for specific, geographically referenced information such as within field reports, legal documents, or and research literature is fundamental to earning competitive advantages. In this latest round of investment, one of the companies supporting MetaCarta was ChevronTexaco's venture arm because, as Mr.Frank points out, the company saw some unique possibilities for solving some difficult problems in the petroleum exploration and production phase.[As an aside, Allan Nunns, General Manager of ChevronTexaco Information Technology Company, will be the Keynote Speaker at Directions Magazine's "Location Technology & Business Intelligence Executive Symposium this May].

So how would a company implement a MetaCarta solution? The outside layer of a MetaCarta system is a web services interface using SOAP.There are two classes of products (GTS and GeoTagger, a tool for finding explicit or implied geographic references that are then "tagged" using XML) that use a web interface.The most important part of this process is for getting documents "in" from whatever type of system that the environment has, whether it is a shared network drive, CMS, email systems or websites, for example.

The next stage is processing the documents that are retrieved from these various systems.The processing involves the identification of geographic metadata.The metadata that is collected can go into two different pathways depending of which of the two products are used.Using the GeoTagger product, the geographic data is sent to an external database.An alternative pathway for the metadata is into MetaCarta's own text search indices, GTS.When a user does a search through a browser interface or a system such as ArcGIS from ESRI, it is against those indices.

This entire process of retrieving geographic references from documents, creating the metadata and the subsequent indices is what MetaCarta calls, geoparsing.MetaCarta has created this statistical, natural language parser that goes through documents, word by word, and looks for possible interpretations, and specifically those that are geographic.MetaCarta's unique technology looks at the statistical sequencing from word to word to review the full context of the word string so that when it encounters geographic references such as country or city names, it can deduce the reference as being geographic in nature.Likewise, as the geoparsing continues to search documents, and it encounters other phrase of "like" content, it can make a reasonable interpretation as to whether that reference was related to words previously encountered.For example, Mr.Frank mentioned that if the word "Cambridge" was identified in the document and further down in the text passage, the word "Massachusetts" was also identified, the geoparser would then make the distinction that "Cambridge" mentioned earlier was indeed the one in Massachusetts and not the city in England.

"When we ship the software to a customer, the geoparsing module which is the core of the whole system is really a giant set of data, a bunch of numbers that are used for interpreting the text at the customer site. And we built up those numbers using our own set of proprietary tools that process a giant plethora of text, hundreds of millions of documents that we've accumulated in order to train the systems to recognize what the author really intended in terms of geography," said Mr.Frank."When the metadata comes out, it has coordinates and a confidence probability of how sure the software is that the coordinates that it is outputting are the ones the author would have chosen."

Figure 1 below is a map of downtown Boston showing the location of geographic references from a text search on the word string "GIS." As noted above, the icons show the location of references found within the document being search.

Figure 1.Click for larger Image

In Figure 2, the MetaCarta system retrieves the geographic reference, the latitude and longitude coordinates and the corresponding text string with the specific reference along with links to the document and map.

Figure 2.Click for larger Image

Published Friday, March 12th, 2004

