Today's geo-enabled Web relies on a host of technologies and processes, notably mashups. But according to authors Giuseppe Conti, Raffaele De Amicis, Federico Prandi (all from Graphitech) and Paul Watson (from 1Spatial), those are just a stepping stone to future technologies and processes that will enable a spatio-temporal Internet of Places using the growing mass of unstructured data resources on the Web
A significant proportion of the content available on the Internet has a spatial dimension. This may be either explicit or implicit, for instance a place name within a document or the location of a place referred to within a tweet. However, if we take a look from a geospatial standpoint at the various types of Internet resources and applications used today, we see a fairly fragmented picture. The original concept of the “web of documents” has been complemented, in recent years, by various Web 2.0 or “mashup” technologies. An increasing number of real-time sensor data feeds and an unprecedented amount of unstructured crowdsourced information now complements standard geospatial resources. This evolution is multifaceted, leveraging concepts such as “Internet of sensors,” the “Internet of things” and the geospatial Web.
Despite this added versatility, if we look carefully at the “low level” infrastructure used by those Web 2.0 apps, we see that they are still based on the same simple dereferencing model whereby resources are published at precise network locations, typically defined through a URI; the recent advent of these Web 2.0 technologies and semantic tools has not essentially affected the classic “publication-search-retrieval‟ paradigm. And, because the core Internet standards lack “native” spatial support, the Web is very limited in its ability to deliver geographical or location-based contextualization of most digital resources available.
This gap has been partially filled by the growing number of mash-ups based on geospatial technologies. Nonetheless mash-ups only provide a partial solution, designed as they are to operate within well-defined and a priori well-known sets of features. One could claim that, with the advent of mash-ups, the old concept of “information islands,” a popular metaphor used to refer to GI data silos, has given way to “information archipelagos,” new wider clusters of aggregated geospatial and non-geospatial content at a World Wide Web scale.
At the same time, if we analyze the behavior of Internet users today, we observe a radical evolution in the way users are discovering information as they use platforms that provide access to resources without any explicit reference to their Internet or network location. Instead resources are accessed according to their geographical context or, in the case of LBS, according to the physical location of the user.
This radical paradigm shift is not simply a technical one; more importantly it is a cognitive one, affecting different social and age groups and it creates the pre-conditions, at societal level, for true spatio-temporal enablement of the Internet.
Although such a radically different perspective requires addressing a number of fundamental challenges, the existing stack of Web 2.0 and GI technologies could be reused to interweave a “spatio-temporal fabric” on top of both existing and new Internet resources, to ensure spatio-temporal access, thus creating what we call the “Internet of Places.”
Firstly, at the lowest logical level, a uniform data model is required to represent the majority of the Internet resources available. Although a number of models could be used to allow machine-readable exchange of data between applications, across devices, the simplest and probably most popular formalism is based on definition of a subject and an object linked by a predicate. This model, borrowed from the domain of semantic web, is both very homogeneous and expressive. Furthermore, by its nature, it supports aggregation, by having multiple resources pointing to the same subject and it also allows semantic expansion through abstraction layers logically built on top of low-level tuples made of subject-predicate-object. It goes without saying that one of the most challenging issues to be tackled when following this approach is the definition of robust approaches for defining unique and persistent identifiers.
Such a unified model, which could be formalized as RDF, permits uniform and consistent access to all Internet resources. It will also be very effective to create a set of highly formalized master data resources, or reference geographies, to which other resources can be linked. Crucially, links can then be extended into unstructured and non-spatial data, typical of social networks or user-generated content.
To be accessible, publication of data should occur automatically and through much less structured approaches than those used today. These are typically based on strict registry or catalogue formalisms relying on manual configuration of services, which are bound to specific APIs. Publication models instead should be designed to maximize flexibility, to facilitate crawling, indexing and discovery. Existing technologies such as SPARQL and OGC WFS/Filter Specifications could be adapted and extended to allow delivery of fusible spatial data to higher-level services for integration and enrichment.
At a higher level, if we look at the state of the art for discovery and crawling technologies, these are essentially keyword-driven. While this can be very useful to find documents or information that can be easily qualified through keywords, it obviously falls short when dealing with spatio-temporal phenomena not defined in advance, for instance when the users need to search for “news referring to events within a 10 km radius from my current position.” Being able to process a request like this would require development of data crawlers capable of discovering, in a completely automatic manner, both explicit and more importantly, implicit spatio-temporal information, for instance based on an administrative unit or postal code. The rules that identify which implicitly spatial data need to be indexed should be based on use of programmable, semantic heuristics. This would also ensure maximum flexibility in that rules will evolve to be able to cater to new types of information or for information that evolves over time. Furthermore machine-learning techniques should be developed to identify regular patterns of implicit, spatially linked resources.
What is becoming clear is that traditional cataloguing services, typical of SDIs, are simply incapable of dealing with such complex scenarios and we need to move toward services based on the crawling/indexing paradigm, typical of search engines, which needs to be extended to support spatio-temporal information. For instance, if a website publishes information on pollution over Europe during 2011, the geographical extent can be used to place the resource within a spatial section of the index, the temporal validity can be used to place it within the time section of the index and, finally, its tags, automatically extracted also on the basis of the spatio-temporal features, can be used to allow indexing in the conventional way. This way it would be possible to return the dataset to those interested in resources covering Europe or to those interested in pollution during 2011.
To do this, indexers should produce 4D maximum bounding hypersolids (MBH) describing the spatio-temporal. The support for various scales should be integrated with support of different spatio-temporal tolerances that should be embedded within the MBHs. Tolerance should be formalized in a standardized manner by extracting it either from explicitly defined metadata or, more interestingly, should be inferred from the semantic context of the concept being described, for instance a region, a country, a city.
Additionally, during the indexing process, each resource should be also be linked to one or more topics, possibly from within a consistent concept hierarchy. Existing ontologies or data models should be used to map the resources onto the concept hierarchy. Furthermore, existing GeoNames, identified during the parsing, should be used by the indexer to associate the resource to the spatial extent associated with the GeoName. The combined use of these approaches would allow search refinement by spatio-temporal extent or by topic of relevance, or both.
The searching interface itself should also depart from the current keyboard-focused one. Instead the search engine should allow browsing of ontological concepts of interest which should be automatically linked to the information within the user’s current field of view, to determine the spatio-temporal extent (as well as the scale) of the search. The user could, for instance, select a topic of interest and fly virtually, using a 3D geobrowser, to an area of interest, while the search engine continuously recasts the query according to the user’s point of view, exploring the “information scene.”
Additionally, to be practically viable and scalable, strategies for federated or delegated searching need to be explored, creating service hierarchies based on a master index service delegating to lower level authoritative services, for instance based on geographical coverage. These in turn should provide caching mechanisms and implement robust data expiry principles to invalidate expired data. This would allow tuning of the caching process both in terms of spatial and temporal granularity, for instance having certain portions of the cache being refreshed every 10 minutes, in case of highly dynamic data produced by a social network, while other portions of the cache dealing with reference geographies are refreshed annually.
At a further higher level, a set of data enrichment services should be developed to free the user from the mechanical chore of locating and integrating correct information. Data brokers should mediate requests against existing indexed data sources and infer required data by joining or refining the component data sources, returning them to the user. These data brokers should support automatic orchestration of the underlying data services to allow aggregation of low-level data in a useful way.
The data brokers should make use of configurable heuristics to associate spatial and non-spatial data and they should intrinsically support the concept of data quality based on provenance and quality metadata. This information, in turn, should be used to provide confidence measures, particularly useful when providing aggregated results. Being able to deal with uncertain data is essential to make sure that, at each stage of the data lifecycle, the user is aware of the quality of the information being returned by the system.
Eventually, at the highest level, we find the presentation services that support explorative approaches to data discovery, through the use of user-friendly technologies such as augmented reality or 3D geobrowsers. Not only should the user be able to define a concept and fly to an area of relevance, to see results interactively presented to the user, but the system should also adapt the presentation of those results to differing users’ software or hardware and to the physical context in which the user is operating. The same result might be rendered in very different ways if the user is using a lightweight application at night on a mobile device with a small screen connected through a 3G network or at home using a powerful desktop PC with a high-resolution screen connected through a broadband connection.
We should certainly not underestimate the technical and conceptual challenges of extending existing core Internet services to accommodate multi-dimensional data natively. However, the prospect of gaining accurate spatio-temporal context for many, if not all, Web resources and applications, while avoiding the complex and brittle manual configuration steps required today, is a prize worth striving for. The Internet of Places is a Web which sees and fuses information together in ways much more like our human imagining than simple keyword searches and mash-ups. In building the next generation of Web information services, we will need to dissolve the artificial barriers that surround spatio-temporal and other multi-dimensional data, and doing so will bring the substance of the virtual world an intuitive step closer.
Authors’ note: The achievements of this paper, contributed by Graphitech, have received funding from the EC through the Seventh Framework Programme under the Grant Agreement n. 234239 (project “i-Tour”) and through the project BRISEIDE , co-funded by the CIP-ICT Programme and project NatureSDI+ co-founded by program ECP 2007 Geo317007. The authors are solely responsible for this work which does not represent the opinion of the EC. The EC is not responsible for any use that might be made of information contained in this paper.