Fwix: Organizing the World’s Information by Location

Directions Magazine (DM): Fwix states that it "organizes the world’s information by location." This is not a new idea. What does Fwix bring to the table that's different than what came before (MetaCarta, now part of Nokia, or .geo, to name two efforts)?

Darian Shirazi (DS): MetaCarta licensed its geotagging technology, and focused on serving the enterprise markets, primarily in the public sector.

In the past, the common misconception with regards to geotagging has been that the technology is the main value of the process. MetaCarta and other services have offered - at a per-document charge - the ability to geotag articles and any piece of content. We believe that is the incorrect way to look at the space. Just as Google provided a system for indexing the Web and an index behind that system, we’re offering a system that is a geotagger plus an index. The two combined allow publishers to feed new content in, and developers and users to extract valuable information from the index.

By making the index bi-directional -- input and output -- we’re able to help publishers, get distribution, and use data to better target local advertisements, and that’s why we’re giving away our Geotagger for free.

On top of this, as mobile devices become the way people consume content, we believe location will become more important than ever. For the first time, we have a device that is designed to be the center point of a person’s life, and it knows where you are. Location is a very relevant form of determining what a user wants and being the index behind these new devices is where we believe the billion dollar opportunity lies in this space.

Implementation of the Geotagger Button on a sample webpage. In this case, the tool found three locations in the text.

DM: Fwix started as a company aimed at filtering local news and information and expanded to filtering social media and other content. What are the special challenges of geotagging the former (news/info) versus the latter (social media/etc.)?

DS: Geotagging in general is complex and all online content, whether news or social media, has major challenges. While a lot of the social media content that comes from the mobile phone (tweets, photos taken from mobile, etc,) already has geo-data that we can use as inputs for our geotagging system, there is still a significant amount of social media content that lacks associated geo-metadata. Plus, a lot of social media content is unstructured. Social media content is user generated, so users may likely reference the same location in different ways which makes entity extraction even more challenging (but this issue exists as well in geotagging news content). Entity extraction is the process of taking a piece of content and figuring out what locations it mentions or what locations it’s about. It is the ability to analyze that content and linguistically understand an entity and what is a pronoun and what isn’t a pronoun. A common example is a restaurant called “Hog ‘N Rocks” in the Mission District here in San Francisco. It may be referred to as “Hogs & Rocks” or “Hogs and Rocks” or “H N Rocks.” You want to catch these entities and be able to say this article is about these restaurants and these lat/longs.

Now, the entity extraction problem is a linguistic problem and one we’ve been working on for two years. We’ll constantly be improving the Geotagger and we’ve gotten to the point where we improve its accuracy on a daily basis but not in drastic amounts. In addition to the linguistic problem, we also have had to build a large taxonomy to which entities are matched. This taxonomy is a large places database that is a combination of data we’ve crawled and licensed. Of the several data sources we’ve crawled, the challenge is de-duping the taxonomy and creating a structure that allows us to quickly match entities against that structure.

DM: You note the free Geotagger app provides publishers with data on locations within its content, such as mentions of Boston and Vancouver in a story about the Stanley Cup playoffs. How would that be used to drive advertising? Or would it be combined with other information, such as the location of the person consuming the content?

DS: The Geotagger can identify referenced places in a story (for example, a restaurant, business, neighborhood, etc.). Publishers can then access the location data identified by the Geotagger to input into their location-targeted advertising solutions. Having this place-based knowledge (e.g. the location data provided by Geotagger) referenced within content enables advertisers to create and improve their location-targeting campaigns down to a place level.

A good example of this is the ability to target daily deals more effectively to users. If you happen to be reading an article about the Stanley Cup and you’re based in Vancouver and the article carries our Geotagger, we’ll be able to find the most relevant Groupon that matches the article’s content -- maybe a deal for a free hockey lesson or a deal to get 50% off all Canucks merchandise at the local sports shop. This level of targeting in conjunction with the massive local business model that Groupon and others have created will be the way for these daily deal services to increase sales even more. Such a level of location context targeting has really never been done before.

DM: What are other examples of how the Geotagger app can serve content publishers beyond location-based advertising?

DS: As mentioned in the first question, the Geotagger enables publishers to add a location-relevant layer to their content which is of value to both consumers and publishers. Consumers will be able to find geo-located content so they find out what’s important right here or nearby. Consumers can discover relevant and nearby content at this location. Location is now a key contextual factor - whether it is via a geo-enabled mobile device or HTML5-enabled Web browser on a PC that has location-detection built in.

We’re also looking at ways to integrate local search directly into publisher products, but we’re not ready to talk fully about those yet.

DM: At Where 2.0 this year you talked about an open places database. Is there any real motion on such a thing? Has Simple Geo's release of its Places database had any impact?

DS: Our new API launch was a step forward in working toward such a goal. Developers can freely use and build geo-enabled apps on top of our API. Plus, our goal is to make our platform completely and entirely open. We have a rate limit in there to prevent spam and prevent people from abusing our API. However, if a developer wants those restrictions removed, we have a provision that will allow that to be relaxed automatically in the event a developer builds an amazing app.

Plus, we think that an open places database is a lot more than just a list of places -- it’s a list of places and content, it’s a list of places and the reviews associated with those businesses, it’s a list of parks and the status updates from those parks, etc. It’s the context around a place. Additionally, an open places database is something to which you can contribute. We want our users and developers and everyone to come back to us with new information about places, and post to our API.

This all links back to our strategy of being an index that is both input and output. We want developers and publishers to be able to submit information to the index, get it geo-tagged, and extract information from the index as well. By being an input/output index, we’re able to serve developers and publishers completely.

DM: What is the state of hyperlocal media? There was a bit of a boom back in the day with Topix, then EveryBlock (now part of MSNBC.com) and now AOL's Patch (now part of the HuffPo part). Is aggregation (Topix/EveryBlock/Fwix) the future or is it local content (Patch/NPR)?

DS: Fwix started off by geotagging news content but our vision is to index the Web by location, which is including more and all types of content and webpages.

Needless to say, the state of hyperlocal media continues to grow rapidly -- as more media publishers (Gannett, Washington Post, AOL/Patch, etc) are launching hyperlocal projects and aggregation services. Plus, we all know already the opportunity: local search advertising expected to be $8.2 billion by 2015 (BIA/Kelsey), and by 2014 local advertising expected to be $35 billion.

The challenge with hyper-local is finding equilibrium between market reach and cost. In other words, a hyper-local product needs to serve millions of people in many markets without costing an arm and a leg. In AOL’s case, it has spent nearly “$100M to set up” small outposts of journalism farms that produce hundreds of hyper-local stories each day. Even after pouring such a large amount of money into the development of local content, it is still only covering about 10% of the neighborhoods that “matter” in the U.S. With plans to expand internationally and cover all worthy neighborhoods in the world, AOL is looking at a billion dollar expense. This isn’t the solution to satisfying the need for improved local content country-wide. At this rate of expansion — even if AOL had billions to spend — we would see only half of the densely populated neighborhoods covered within the next two years. I’m using AOL as an example here, and that’s because it is the only company that has really gone all-in with the local content strategy. Most of these hyper-local media sites are run like old media, versus new media. Old media companies are “technology versus editorial,” where editors have absolute control, and new media companies are “technology plus editorial.”

Neither aggregation nor original content is the future of hyper-local media. The future is the company that will properly serve the local search needs of the mobile devices just as Google served the needs of computers in the late ‘90s and early 2000s. The local search company that can geo-tag the Web will then serve the best content to mobile users. This will, in turn, drive traffic to content created by media companies and monetize content the best by delivering daily deals and local advertisements alongside that content.