CEGIS Response to NRC Report

Last week our January 29 podcast focused on the National Research Council Report titled "A Research Agenda for Geographic Information Science at the United States Geological Survey," which offered 12 recommendations to the Center of Excellence for Geospatial Information Science (CEGIS), a research arm of USGS. E. Lynn Usery, acting director of CEGIS, contacted me to provide a bit more information and to, in his kind words, "correct some misunderstandings."

First, let me set the timeline. The preliminary report was made available to CEGIS, and the public, back in August of last year. The final version came out in December and the press release announcing the report appeared late in January. So, by the time we discussed the report last week, CEGIS had already made a plan to address the recommendations, including staffing up.

In the podcast, we noted that the National Research Council Report had observed the fact that The National Map (TNM) did not seem to know what a "canyon" was. Joe Francica and I conducted some "real-time" research to test whether The National Map and Google Maps did, indeed, know about the Grand Canyon. I did a query for the Grand Canyon at TNM and learned, as I noted in the podcast, that I had to choose a feature type to search. I chose "state" with disappointing results. Usery explained that had I, in fact, selected "valley," I'd have been more successful and would have found the Grand Canyon. We never provided the answer regarding Google, but Usery finished the research and reported that Google Earth could not find the Grand Canyon and that Google Maps found just a point.

From a technical standpoint my experience using TNM viewer makes sense. The Geographic Names Information System (GNIS) is a key component of TNM, providing information on place names to a number of applications. The system requires a feature type for an effective search of the database. That highlights two of the projects already underway at CEGIS: user centered design and ontology research. The former aims to simplify the user interface and make searches, like the one I tried, more intuitive. The latter aims to build an ontology (in short, a map of the meanings of words and their relationships) so that the user would not need to know, for example, what sort of feature the Grand Canyon is.

Our discussion on the podcast of fusion was also of interest to Usery because of changes to the original mission of TNM. The original vision for the project was to collect data from state, local, tribal and other sources and pass them through (publish them) for end users. That was quite a challenge because the data are of different quality depending on the source, use different symbology, don't match up, etc. Consequently, the plan for TNM changed from being basically a server of distributed GIS data to being a centralized database of GIS data. The data in the TNM database(s) are to be processed beforehand to ensure quality and consistency and to enable the generation of maps and other derived products at a consistent scale and with consistent symbology, etc. CEGIS is exploring exactly how to do that processing, much of which it hopes will be automated. I picked up this tidbit: data with between 6-7 meter root mean square error values "match up" (overlay) well enough for most uses.

The processing of the data coming into TNM puts the focus back on the eight data layers it includes. It also ensures that the data are validated or "stamped with USGS approval," something that can't be said of other online data sources. That, Usery hopes, will distinguish TNM as a "trusted data source."

The projects I've discussed above are included in the short-term goals for TNM. Projects on user centered design, ontology and fusion are already underway and the hope is that results of those efforts will appear in the application in the near future (think years, not months). That said, one project is rather far along. The Cartographic Research group, which existed before CEGIS was created in 2006, began a study of generalization some years ago. The results, a software package, can generalize 24K scale hydrography to any scale. That service will hopefully be available online soon as part of TNM. The longer-term vision for CEGIS and TNM is equally or perhaps even more interesting and revolves around what's essentially a new database structure for spatial data. It would include:

A spatial/temporal data model (different version of a feature could exist based on time - think of changing shorelines, for example)
A quality-aware data model (each feature would carry its own metadata, so its quality would be known)
Transaction processing (editing could be done on a feature by feature basis, in contrast to "locking out" a whole layer to edit just one feature)

All of these ideas are on my "holy grail" of topics that have come up in geospatial technologies over the last 20 years. So, I asked the next question: When CEGIS figures this out, will the research eventually filter down to the software that end users touch everyday? That is certainly CEGIS' intent, explained Usery. You can find reports on current work on the CEGIS website.

I asked Usery if he knew of a situation where government research, in this case from CEGIS, ended up touching citizens, or at least geospatial professionals. He offered this terrific story, which again features the Cartography Research group. In 1999 that group realized that when spherical raster data sets (such as the global land cover database, 30 arc seconds) were projected to planar coordinates using some commercial packages, there were "wrap around" problems. That is, Alaska sometimes appeared twice - once on the east side of the map and once on the west. So, the team wrote software to fix it. It's called MapIMG. When the team presented its work to an audience that included several commercial software vendors, while they didn't take the freely available code to create a fix, they did go back to their offices and implement the solution.