There are several problems in both performance and usability in the representation and provision of large volumes of geospatial data on the Web.
In reference to usability, users often encounter cases such as simultaneous display of data that are stacked on the same geographical area, which makes it difficult to differentiate them and see the underlying base map.
In regard to performance, problems are found with the server, i.e., large amounts of data involve a great number of queries to databases and large volume of memory used per request; in reference to bandwidth, great amounts of data travelling over the Internet involve delays in requests and high bandwidth consumption. As for clients, the problems stem from the fact that current Web browsers have trouble rendering large amounts of data.
This article shows improvements in these areas via level of detail techniques and clustering, and generation and precaching of tiles with vector data in a manner similar to what is done with the information in raster format. Likewise, Extract-Transform-Load (ETL) techniques are shown with those obtained from Web data from a different source for integration into a single repository.
The main technological objective of the GEOStore project is industrial investigation into new types of geolocated digital content such as 3D models and augmented reality, as well as defining new bundling techniques and Web distribution of digital content to facilitate new business models.
To achieve these goals, the project also addresses secondary objectives to maximize interoperability with existing and future platforms, and develop improvements in performance and usability over current processes that enable working with large volumes of data on the Web and on mobile devices.
It is anticipated, when the project is completed, that the results can be used to enhance technological development in growth sectors within the scope of various digital content in sectors such as education, geomarketing, tourism and urban management, among others.
Moreover, these results can be applied in all sectors that enhance development of geolocated digital content in the information society, in that the processes obtained will facilitate the creation of platforms and interoperability for the distribution of digital content, their marketing and sharing via social networks. The use of these techniques will strategically position the sectors applying them in the Web 2.0 market.
This article shows the results obtained thus far in terms of integration of existing geospatial data sources, vector performance and usability.
2. INTEGRATION OF EXISTING GEOSPATIAL DATA SOURCES
Based on identification of those data sources relevant to the project and their characterization in terms of type of license and accessibility, the implementation of a demonstrator built on the ETL tool, GeoKettle
, has been undertaken for the importation and integration of geolocated data into a single spatial database: PostGIS
, which will be the central GEOStore repository.
To implement this demonstrator, three generic sources of widely used and followed information such as Wikipedia, Flickr and OpenStreetMap were selected. In all cases, a crowdsourcing type of information was used, in which it is the users themselves who voluntarily maintain and add new information to the network.
The three sources of information shared a common feature: the information that can be queried and, ultimately, downloaded is likely to be located at a specific spot on the earth's surface, in other words, it is geolocated information. In contrast, there are three completely different sources of data or information in regard to the format and nature of the data they contain (text and graphs in the case of Wikipedia, photographs for Flickr, and geometry and attribute values in the case of OpenStreetMap).
To overcome this difficulty, a GeoKettle workflow was designed. It was based on the use of APIs (Flickr), SQL scripts and statements, which facilitated downloading the information from the network to edit and store in the PostGIS database.
Fig. 1: General workflow (job) defined in Geokettle
The entire process shown in Figure 1 is run simultaneously, thus we obtain the information we wish to store in Wikipedia, Flickr and OpenStreetMap.
Downloading data from OpenStreetMap and Wikipedia was carried out directly from the network. Once downloaded, the data were processed in GeoKettle before being imported to the central database using the various connectors to databases (MySQL and PostgreSQL).
Fig.2. Workflow for download of georeferenced articles from Wikipedia
Fig.3. Workflow for download of georeferenced articles from OpenStreetMap
In the case of Flickr, its API
was used in REST
format for geolocated data download. Flickr limits the maximum download of elements per request to prevent massive downloads. For this reason, it was necessary to automate the generation of requests (in URL formats) to invoke the API without exceeding the download maximum. The automation process was rendered by defining a recursive script in python. After downloading the data in XML format, they were parsed and imported to the central database from GeoKettle.
Fig.4. Work flow for downloading georeferenced data from Flickr
3. VECTOR PERFORMANCE AND USABILITY IN THE WEB
OGC/ISO standards were used for access to vector data, establishing a standard format and storage protocol to transmit vector GIS information. Likewise, these standards support capabilities for representative (3D support, augmented reality, etc.) and functional capacity (editing, transactions, payment details, etc.).
Moreover, various research activities have been conducted to achieve an increase in reading capacity of vector data using OGC standards. Open source software APIs to OSM are used to download and create points of interest from the Internet for Web clients and mobile platforms (“Wikipedia sites”).
In reference to research techniques for improving performance and usability of Web and mobile customers, research has been conducted for creating components based on multilevel visualization techniques and space partitioning techniques via:
Grouping based on levels of detail
Tiling allows splitting geographic information based on its location into a hierarchical tree-shaped structure. The hierarchical structure levels refer to the visualization levels defined in the display. Each level has a spatial segregation that quadruples that of the highest level. This model is well-established in Web servers that offer raster cartography via WMS-C, TMS and other protocols. In this case, attributes and vector element geometry of the layer of information points are stored in each tile of each level.
Fig. 5. Distribution of geographic elements in the tile structure by level of visualization
Furthermore, grouping of information by levels of detail makes it possible to combine nearby isolated information; thus, the number of elements represented on the map at a zoom level is reduced, thereby increasing the legibility of the map since it is not saturated with information.
Fig. 6. Synthesis model of geographic elements by level of visualization
This model combines tiling of information by levels, and the clustering of information makes it possible to provide the tiles with vector information that the Web browser will specify sequentially through small GeoJSON format files that have information from a defined geographical area.
The components created enable an increase in performance and usability when displaying and interacting with information, thereby solving the problem browsers have when painting large amounts of these elements.
Fig. 7. Combined tiling and clustering model of geographic information
The GEOStore project was started in 2011 and has been developed over the past two years by the Prodevelop S.L. and Geoturismo S.L. companies and the Servicio de SIG y Teledetección of the Universidad de Girona with funding from the Plan Avanza (Digital Content) from the Spanish Ministry of Industry, Tourism and Commerce.
Fig. 8: Project logo and the entities that have funded it
Work is currently being done to show the final project demonstrator: the online geolocated multimedia products store. The portal was developed using Drupal 7 and the e-commerce module. The store will be able to store and make available to users a large quantity of geolocated spatial information. The information is displayed to the user through listings or on a map, with the aid of its geographical location. In addition, the store will permit any person or entity to add their geographical elements.
Fig. 9: Appearance of the alpha version of the online GEOStore