Directions on Cloud Computing

Cloud Computing

Directions Magazine

Harvard’s TweetMap (ALPHA): Explore 125 Million Tweets

Monday, February 25th 2013
By Adena Schutzberg

Ben Lewis, who's over at the Harvard Center for Geographic Analysis shared that TweetMap, built on MapD (general purpose SQL database) and Harvard's WorldMap, is up and running in ALPHA.

Officially, "TweetMap ALPHA is an instance of the MapD big data platform developed through a collaboration between Todd Mostak and Harvard CGA." I corresponded with Mostak to learn a bit about the project and its future.

TweetMap allows the exploration of some 125 million tweets from 12/10/2012 to 12/31/2012. Visitors can query them by time, space, and keyword.  The hope is to increase the size of the database, perhaps to billions. Real time streaming from tweet-tweeted to tweet-on-the-map in under a second has been implemented.  MapD makes use of any number of commodity Graphic Processing Units - so it will use whatever it has access to use. Todd Mostak notes, "it runs equally well on my laptop with 1 GPU as our demo server with 4 as a Dell GPU server with 16 (of course the more GPUs you have, the faster things will run and the more data you can store)." GPUs, and their role and geospatial, are covered in this Directions Magazine article.

Harvard users (with a log-in) can even download the tweets found by their queries. The rest of us can see the results as individual "dots" (with details of the tweet content, data, lat/long, etc.) and/or see a heat map. The one at right is a query for "Obama" across the entire time frame. I also searched for "adena" and found but a handful - many around a geography with that name.

What's next? Mostak shares:

...we will soon allow for spatial joins/intersections of points to polygons.  This means that the user could upload an arbitrary shapefile of say census districts and basically find the average sentiment of tweets containing the word "Obama" in each district and then regress that against attribute data, such as income or education level for the district.  On 4 GPUs we should be able to do around 4 billion such joins per second, as opposed to PostGIS or ArcGIS which seem to top out at 10,000-20,000 such operations per second, allowing real-time choroplething and regression analysis of spatial data for datasets which might take PostGIS or ArcGIS many days to do the same thing.

Bookmark and Share

Your Comments

Upcoming Webinars Prev | Next

Tuesday, October 28
UAVs for Survey and Mapping - Part II
Sponsored by: Directions Magazine India

Tuesday, November 11
Turn Google Maps into a Decision Support Portal With GAP
Sponsored by: TSI

Cloud Computing Newsletter

Coming soon! Get the newsletter with the latest Cloud Computing headlines and feature articles. Enter your Email.

Follow

RSS 

White Papers

The New Paradigm : Leveraging the “Information Cloud” with the Dynamic GIS

Last year, a magnitude 7.0 earthquake struck just miles from Haiti’s capital city of Port-au-Prince. More recently, massive earthquakes have also struck Christchurch, New Zealand and Japan’s...Download this paper

Advertise on this Channel

Cloud Computing Rate Card

Twitter RSS Facebook LinkedIn Delicious
About Us | Advertise | Contact Us | Web Terms & Conditions | Privacy Policy
Publications: All Points Blog
Conferences: Location Intelligence Conference | GEO Huntsville
© 2014 Directions Media. All Rights Reserved