Local Search: State of the Art and Challenges Ahead

By Marty Himmelstein

If Internet users have heard of local search, they are likely to think of it as the Internet equivalent to the Yellow Pages. This association is natural, because the Internet Yellow Pages (IYP) have been one of the most popular applications since Yahoo! made the Internet accessible to the general public.That was ten years ago, and the state of the art in local search has remained distressingly stagnant since then.

Even people involved with local search commonly use the term to refer to the techniques and services advertisers employ to improve the results on the money they spend on pay-per-click (PPC) advertising. These paid listings embroider IYP results, city guides and other geographically oriented applications.

There is a natural tendency to conflate local search on the Internet, still in its nascent stages, with the Yellow Pages, a trusted brand name, and an almost idiomatic expression for any source of easily accessible information on storefront businesses.Local search as a synonym for the Yellow Pages reflects today's state of the art, and it will continue to be thought of in this restrictive sense until a more expansive breed of applications becomes available.

A more expansive definition is inevitable.There are few applications on the Internet where the gulf between what is and what should be is so stark.Local search will become a fundamentally useful Internet application.It will encompass the geographically oriented activities of daily life: "what is in my neighborhood and what is happening in my neighborhood." It's the combination of commerce, community and civics.

Local Search Drivers
The catalysts that will contribute to the development of a more comprehensive local search are either in place or assembling.The most obvious one is the size of the local advertising marketplace.
Annual YP revenues in the USA hover around $14 billion, and newspaper revenues related to local advertising were close to $40 billion in 2003.

The second fundamental catalyst is the widespread availability of broadband connectivity to the Internet.High speed access is the single most important predictor of the intensity of an individual's Internet usage.According to a report published by the Pew Internet Project, The Broadband Difference, as of early 2004, 55% of all US adult Internet users had broadband access to the Internet, and 24% of them had such access from home, a 60% increase over the previous year.

Third, the Internet is already a rich source of local content: at least 20% of web pages contain a well-formed North American address or telephone number.Conservative estimates of the percentage of web searches that are geographically oriented are also around 20%.

Once broadband access reaches a critical point, and transforms the Internet into an "always-on information appliance," much of the money that advertisers have been content to dedicate to traditional media will find its way to the Internet, quickly.

The rapid growth of web logs (blogs) is also cultivating the ground for local search.Blogs received initial attention as a new form of news and political commentary, and in the process demonstrated their potential as carriers of more mundane but no less valuable information. The apparently successful experiment of blogs-as-journalism has generated interest in community-oriented sites that feature user-contributed content: local news, school board meetings and other online equivalents of the news about town.
Businesses, especially small and medium enterprises (SMEs), are part of the local community and could provide the advertising revenue to sustain these "hyper-local" sites.

The Progression from Offline-derived to Internet-derived Local Content
Local Internet content originates either on the Internet directly - Internet derived, or from other, offline, sources.This distinction might seem trivially obvious, but the dichotomy is useful as a gauge for the maturation of local search.

The first local search application, the IYP, was derived entirely from offline sources.It was, and is, a repurposed direct mail mailing list, compiled primarily from printed telephone directories.
Then, Yahoo! and others made Internet derived content more accessible by creating mediated indexes to valuable pages, some of which were local.Today, Google (and eventually others) geo-enables web content by locating, geocoding, and encoding addresses and phone numbers on the 20% of pages that contain them.Some of these pages contain superb local content crafted with care and precision by the web authors who create them. Still, geo-enabled web search can be like a box of Cheerios emptied into a haystack - the food is there, but it's no way to eat breakfast.

The challenge now is to rein in and organize this directly contributed local content - such content is the incontestable virtue of the web - so that it can be categorized and reliably found without imposing onerous burdens on its creators.Then, we will have progressed from a situation where virtually all local content was grafted onto the web from offline sources to one where the Internet is a thorough source of all local information: commerce, community, and civic.

A mature local search will subsume the functions of offline sources of local content with Internet derived equivalents: the Yellow Pages, the sections of newspapers related to local content, classified ads, circulars, regional magazines, direct mail.
This is not to say that these other sources of local information will necessarily become obsolete, though some will.However, the determining factors will be cultural rather than technical, preference rather than necessity.

The progression from offline to Internet derived sources of local content must be accompanied by the technical infrastructure to enable it.These include trust mechanisms, locally oriented search applications, descriptive standards, and user interfaces that help merchants and others create rich and standardized information.The technical parts are not particularly challenging, and precedents for most pieces either exist or are being worked on in parallel contexts. More difficult are the attitudinal and political changes that are required, such as making the Internet compelling for the millions of SMEs for whom it currently isn't, and creating Internet accessible versions of auxiliary data that might be useful, for example, as part of a trust mechanism.It would be a mistake, however, to consider these non-technical challenges daunting: a mistake that entrenched interests, such as YP publishers, might be tempted to make.

Internet-derived Local Content
In the developed world the most common way to specify a location is to denote it with a group of rigorously constrained tokens: a postal address.This is a fortunate circumstance for local search, and from it numerous benefits accrue.

The first is that the presence of an address on a web page provides a strong correlation that the page is relevant to local search.The correlation is not perfect, but it doesn't need to be.An address is a significant hint that search engine classification algorithms can use to help determine what a page is about.
It is a competitive necessity for search providers to continually improve their page classification capabilities. The commercial incentives for local search are enticing enough that search providers will closely scrutinize pages with addresses to correctly characterize their local content.This intense scrutiny would be too computationally costly to justify performing on all pages.

Well-formed addresses are computationally easy to find.Since postal addresses adhere to a rigid format, and usually contain at least one or two unambiguous tokens, such as a postal code, search engines can detect location information without incurring excessive costs.For the purposes of local search, recognizing a location is a syntactic, not a semantic, problem.
Local search is the beneficiary of the groundwork laid by the national postal services to ensure the efficient delivery of land mail.Thanks to them, and the decades they spent teaching the public to properly specify addresses, local search can rely on these previously established conventions.

Another advantage, and one that will resonate with the readers of this publication, is that the process of geocoding a well-formed address transforms it into an accurate latitude and longitude, and therefore into a precise map location. Geocoding databases are dynamic applications that represent significant intellectual capital.A simple address on a web page, therefore, provides access to this rich external source of information.An address is the key that enables proximity searching, map rendering, and driving directions, all of which are necessary components of local search.

Perhaps the most intriguing aspect of the 20% of web pages that contain an address is the sheer range of local activity they represent.Some local content is rather easy to come by, such as restaurants, lodging, and vacation spots.Topics like these are so well covered that general web searches find superb content.Nevertheless, local content covered by vertical aggregators vying for users' attention is the exception rather than the rule.It is the local content that doesn't have large individual constituencies, but that taken together comprises much of what people are interested in.It is this idiosyncratic local content for which the Internet has created an efficient distribution channel where none was.And it is here that local search will shine, not by creating a marginally better YP, but by putting local people in touch with local activities with unprecedented efficiency.

The Not Too Distant Future
The print and Internet Yellow Pages have serious limitations, but they also share a key virtue.They are trusted sources of coherently organized and categorized consumer-oriented business information.And while the web is already a rich source of local content, it lacks the necessary artifacts of organization and fairness to replace either form of the Yellow Pages.While in some ways local search can do far more than the Yellow Pages, in these two fundamental aspects it can't do as much.It is not until local search exceeds the capabilities of traditional YP products in every facet that it will displace them.

Below, I propose the Internet Derived Yellow Pages (IDYP) as a framework for aggregating local information directly from the Internet, and incorporating with it descriptive metadata and mechanisms of trust.
The IDYP provides the groundwork for the more expansive version of local search I noted the lack of earlier.

Let's assume that every business, and more broadly every purveyor of a local activity, could readily create a website to describe their product, service, or activity.Even now, websites that contain a well-formed address or a telephone number would be found by a geo-enabled search engine, such as Google Local.Let's make one additional but modest assumption that web authors adopt a convention to put this descriptive information in a consistently named file or web page on each website.Such conventions are already common, as evidenced by the familiar 'about' and 'contact' pages found on most business websites.Were these assumptions to prevail, general web searches on a geo-enabled search engine would be a reasonable replacement for the Yellow Pages.

Strictly speaking, these assumptions require no new applications or infrastructure.The hardest part would be to entice reluctant SMEs to use the Internet, but as discussed earlier, the irresistibility of the Internet as an information channel for local search is a function of several culminating trends.If the contemplation of how little is necessary for the Internet to be a better Yellow Pages than the Yellow Pages is not keeping YP publishers awake at night, it should.

Still, the simple version of the IDYP described above is incomplete in two ways.First, it lacks the expressive power to sufficiently characterize local information.Second, it needs mechanisms of trust to ensure that users can rely on the IDYP as an authoritative source of local content, especially when that content is related to commerce.XML provides the framework for solving the first problem.With it, we can define a common descriptive core of metadata that pertains to all businesses, and extensions that pertain to vertical industry segments. The core would include not only the basic name, address, and phone attributes familiar from the YP, but others, like store hours, public transportation routes - whatever makes sense, and a rich categorization scheme.The extensions will allow users to perform contextually meaningful searches.
People searching for "fish" may have one of several activities in mind: dining, angling, stocking, or viewing, each differing in the attributes used to describe it.

The XML would be queried in one of two ways.Search applications could incorporate the XML metadata into the indexes they build for the pages that are related to local search.
(One of the core XML attributes is the URL of a web page with which an enterprise's metadata is associated, if any.) This approach allows local search to be provided as part of a general web search: the local search metadata can be though of as footnotes associated with locally oriented pages.Alternatively, the descriptive data can be gathered into a standalone directory that provides search services tailored to local content.Two sample services are a local search query language, and an RSS notification facility that enables enterprises to quickly disseminate new and changed information.

The need for trust mechanisms on the Internet is not confined to local search, and to date local search has not been an important driver in their definition.This is likely to change.Social networks refer to virtual communities built around a shared interest, and for local search would include features like user reviews of local businesses (and reviews of the reviewers), commentary, and discussion forums on community and civic activities.Certifying authorities (Thawte, Verisign, et al) can authenticate the identity of a website, including a related physical location.Chambers of Commerce, trade organizations, and community-oriented sites can lend their imprimaturs to members in good standing.

Published Thursday, April 28th, 2005

Written by Marty Himmelstein

