Local Search: State of the Art and Challenges Ahead
If Internet users have heard of local search, they are
likely to think of it as the Internet equivalent to the Yellow Pages.
This association is natural, because the Internet Yellow Pages (IYP) have been one of the most popular applications since Yahoo! made the Internet accessible to the general public.That was ten years ago, and the state
of the art in local search has remained distressingly stagnant since
Even people involved with local search commonly use the term to refer
to the techniques and services advertisers employ to improve the
results on the money they spend on pay-per-click (PPC) advertising.
These paid listings embroider IYP results, city guides and other
geographically oriented applications.
There is a natural tendency to conflate local search on the Internet,
still in its nascent stages, with the Yellow Pages, a trusted brand
name, and an almost idiomatic expression for any source of easily
accessible information on storefront businesses.Local search as a
synonym for the Yellow Pages reflects today's state of the art, and it
will continue to be thought of in this restrictive sense until a more
expansive breed of applications becomes available.
A more expansive definition is inevitable.There are few applications
on the Internet where the gulf between what is and what should be is so
stark.Local search will become a fundamentally useful Internet
application.It will encompass the geographically oriented activities
of daily life: "what is in my neighborhood and what is
happening in my neighborhood." It's the combination of commerce,
community and civics.
The second fundamental catalyst is the widespread availability of
broadband connectivity to the Internet.High speed access is the single
most important predictor of the intensity of an individual's Internet
usage.According to a report published by the Pew Internet Project, The
Broadband Difference, as of early 2004, 55% of all US adult
Internet users had broadband access to the Internet, and 24% of them
had such access from home, a 60% increase over the previous year.
Third, the Internet is already a rich source of local content: at least
20% of web pages contain a well-formed North American address or
telephone number.Conservative estimates of the percentage of web
searches that are geographically oriented are also around 20%.
Once broadband access reaches a critical point,
and transforms the
Internet into an
"always-on information appliance," much of the money
that advertisers have been content to dedicate to traditional media
will find its way to the Internet, quickly.
The rapid growth of web logs (blogs) is also cultivating the ground for
local search.Blogs received initial attention as a new form of news
and political commentary, and in the process demonstrated their
potential as carriers of more mundane but no less valuable information.
The apparently successful experiment of blogs-as-journalism has
generated interest in community-oriented sites that feature
user-contributed content: local news, school board meetings and other
online equivalents of the news about town.Businesses,
and medium enterprises (SMEs), are part of the local community and
could provide the advertising revenue to sustain these
The Progression from Offline-derived to Internet-derived Local
Local Internet content originates either on the Internet directly -
Internet derived, or from other, offline, sources.This distinction
might seem trivially obvious, but the dichotomy is useful as a gauge
for the maturation of local search.
The first local search application, the IYP, was derived entirely from
offline sources.It was, and is, a repurposed direct mail mailing list,
compiled primarily from printed telephone directories.Then, Yahoo! and
others made Internet derived content more accessible by creating
mediated indexes to valuable pages, some of which were local.Today,
Google (and eventually others) geo-enables web content by locating,
geocoding, and encoding addresses and phone numbers on the 20% of pages
that contain them.Some of these pages contain superb local content
crafted with care and precision by the web authors who create them.
Still, geo-enabled web search can be like a box of Cheerios emptied
into a haystack - the food is there, but it's no way to eat breakfast.
The challenge now is to rein in and organize this directly contributed
local content - such content is the incontestable virtue of the web -
so that it can be categorized and reliably found without imposing
onerous burdens on its creators.Then, we will have progressed from a
situation where virtually all local content was grafted onto the web
from offline sources to one where the Internet is a thorough source of
all local information: commerce, community, and civic.
A mature local search will subsume the functions of offline sources of
local content with Internet derived equivalents: the Yellow Pages, the
sections of newspapers related to local content, classified ads,
circulars, regional magazines, direct mail.This
is not to say that
these other sources of local information will necessarily become
some will.However, the determining factors will be cultural rather
technical, preference rather than necessity.
The progression from offline to Internet derived sources of local
content must be accompanied by the technical infrastructure to enable
it.These include trust mechanisms, locally oriented search
applications, descriptive standards, and user interfaces that help
merchants and others create rich and standardized information.The
technical parts are not particularly challenging, and precedents for
most pieces either exist or are being worked on in parallel contexts.
More difficult are the attitudinal and political changes that are
required, such as making the Internet compelling for the millions of
SMEs for whom it currently isn't, and creating Internet accessible
versions of auxiliary data that might be useful, for example, as part
of a trust mechanism.It would be a mistake, however, to consider these
non-technical challenges daunting: a mistake that entrenched interests,
such as YP publishers, might be tempted to make.
Internet-derived Local Content
In the developed world the most common way to specify a location is to
denote it with a group of rigorously constrained tokens: a postal
address.This is a fortunate circumstance for local search, and from it
numerous benefits accrue.
The first is that the presence of an address on a web page provides a
strong correlation that the page is relevant to local search.The
correlation is not perfect, but it doesn't need to be.An address is a
significant hint that search engine classification algorithms can use
to help determine what a page is about.It is a
necessity for search providers to continually improve their page
The commercial incentives for local search are enticing enough that
search providers will closely scrutinize pages with addresses to
characterize their local content.This intense scrutiny would be too
computationally costly to justify performing on all pages.
Well-formed addresses are computationally easy to find.Since postal
addresses adhere to a rigid format, and usually contain at least one or
two unambiguous tokens, such as a postal code, search engines can
detect location information without incurring excessive costs.For the
purposes of local search, recognizing a location is a syntactic, not a
semantic, problem.Local search is the
the groundwork laid by the national postal services to ensure the
delivery of land mail.Thanks to them, and the decades they spent
teaching the public
to properly specify addresses, local search can rely on these
Another advantage, and one that will resonate
with the readers of this
publication, is that the process of geocoding a well-formed address
it into an accurate latitude and longitude, and therefore into a
map location. Geocoding databases are dynamic
applications that represent significant intellectual capital.A simple
address on a web page, therefore, provides access to this rich external
source of information.An address is the key that enables proximity
searching, map rendering, and driving directions, all of which are
necessary components of local search.
Perhaps the most intriguing aspect of the 20% of web pages that contain
an address is the sheer range of local activity they represent.Some
local content is rather easy to come by, such as restaurants, lodging,
and vacation spots.Topics like these are so well covered that general
web searches find superb content.Nevertheless, local content covered
by vertical aggregators vying for users' attention is the exception
rather than the rule.It is the local content that doesn't have large
individual constituencies, but that taken together comprises much of
what people are interested in.It is this idiosyncratic local content
for which the Internet has created an efficient distribution channel
where none was.And it is here that local search will shine, not by
creating a marginally better YP, but by putting local people in touch
with local activities with unprecedented efficiency.
The Not Too Distant Future
The print and Internet Yellow Pages have serious limitations, but they
also share a key virtue.They are trusted sources of coherently
organized and categorized consumer-oriented business information.And
while the web is already a rich source of local content, it lacks the
necessary artifacts of organization and fairness to replace either form
of the Yellow Pages.While in some ways local search can do far more
than the Yellow Pages, in these two fundamental aspects it can't do as
much.It is not until local search exceeds the capabilities of
traditional YP products in every facet that it will displace them.
Below, I propose the Internet Derived Yellow Pages (IDYP) as a
framework for aggregating local information directly from the Internet,
and incorporating with it descriptive metadata and mechanisms of trust.
The IDYP provides the groundwork for the more
expansive version of local
search I noted the lack of earlier.
Let's assume that every business, and more
broadly every purveyor of a
local activity, could readily create a website to describe their
product, service, or activity.Even now, websites that contain a
well-formed address or a telephone number would be found by a
geo-enabled search engine, such as Google Local.Let's make one
additional but modest assumption that web authors adopt a convention to
put this descriptive information in a consistently named file or web
page on each website.Such conventions are already common, as evidenced
by the familiar 'about' and 'contact' pages found on most business
websites.Were these assumptions to prevail, general web searches on a
geo-enabled search engine would be a reasonable replacement for the
Strictly speaking, these assumptions require no new applications or
infrastructure.The hardest part would be to entice reluctant SMEs to
use the Internet, but as discussed earlier, the irresistibility of the
Internet as an information channel for local search is a function of
several culminating trends.If the contemplation of how little is
necessary for the Internet to be a better Yellow Pages than the Yellow
Pages is not keeping YP publishers awake at night, it should.
Still, the simple version of the IDYP described above is incomplete in
two ways.First, it lacks the expressive power to sufficiently
characterize local information.Second, it needs mechanisms of trust to
ensure that users can rely on the IDYP as an authoritative source of
local content, especially when that content is related to commerce.XML
provides the framework for solving the first problem.With it, we can
define a common descriptive core of metadata that pertains to all
businesses, and extensions that pertain to vertical industry segments.
The core would include not only the basic name, address, and phone
attributes familiar from the YP, but others, like store hours, public
transportation routes - whatever makes sense, and a rich categorization
scheme.The extensions will allow users to perform contextually
meaningful searches.People searching for
"fish" may have one of
several activities in mind: dining, angling, stocking, or viewing, each
differing in the attributes used to describe it.
The XML would be queried in one of two ways.Search applications could
incorporate the XML metadata into the indexes they build for the pages
that are related to local search.(One of the
core XML attributes is the URL
of a web page with which an enterprise's metadata is associated, if
any.) This approach
allows local search to be provided as part of a general web search: the
local search metadata can be though of as footnotes associated with
locally oriented pages.Alternatively, the descriptive data can be
gathered into a standalone directory that provides search services
tailored to local content.Two sample services
are a local search
query language, and an RSS notification facility that enables
enterprises to quickly disseminate new and changed information.
The need for trust mechanisms on the Internet is not confined to local
search, and to date local search has not been an important driver in
their definition.This is likely to change.Social networks refer to
virtual communities built around a shared interest, and for local
search would include features like user reviews of local businesses
(and reviews of the reviewers), commentary, and discussion forums on
community and civic activities.Certifying authorities (Thawte,
Verisign, et al) can authenticate the identity of a website, including
a related physical location.Chambers of Commerce, trade organizations,
and community-oriented sites can lend their imprimaturs to members in