The University of North Carolina's Ryan Thornburg discusses the future of data journalism and digital public records at rural newspapers. He describes the use of OpenBlock, a tool that takes structured and unstructured data and plots it on a map so that data consumers can home in on local news.
The key lesson from the first year of our Knight News Challenge grant? The future of data journalism and digital public records at rural newspapers depends on building a product that balances the needs of three key audiences: the consumers, professional reporters and content sponsors.
The first year of work on OpenBlock Rural focused on successfully overcoming some unknown unknowns with the application. The year ahead has our attention focused on the challenges of building sustainable revenue that will support the product's editorial mission of more informed communities.
OPENBLOCK'S END USERS
Since its inception, OpenBlock has been primarily a publishing tool -- taking structured and unstructured data and plotting it on a map so that consumers could home in on news that was relevant because of its proximity. The presumed need for such a tool has been grounded in both the editorial and business mission of news organizations. Assuming the newsroom is already collecting a dataset and using it to create broad stories for a mass audience, chopping up and delivering pieces of the data to smaller but more relevant audiences is a great way of reducing waste in the news manufacturing processes. A minor vandalism may have no interest to a mass audience, but a great interest to a very few.
The challenge with using OpenBlock to sweep up and reconstitute the scraps of the mass-audience reporting process is that the data that's most accessible is not often the data that's most interesting to consumers.
Crime data -- a driver of online traffic for many local news sites -- is rarely current and rarely completely online in rural communities. Even in cities it is almost always a week old.
Education data and government salary data are also popular, but they fail to be geographically compelling and don't change very often. OpenBlock's audience wants an application that can be searched by more than just "when" and "where."
FOR REPORTERS: A DATA DASHBOARD
With the right sets of data, OpenBlock can do a great job showing individual users a wide variety of news items that are happening near their home. But the most valuable news is reports about what's going on behind the scenes. OpenBlock can't achieve its full value in a news organization without alerting reporters to stories they might not otherwise see.
That's why we're developing a new "data dashboard" feature for OpenBlock that will allow reporters to quickly see trends across time as well as compare data from different towns or neighborhoods. This dashboard doesn't just show where the car is going, it also serves as an early warning system that will be designed to alert reporters to outliers -- the kinds of oddity or magnitude that makes big news out of small points of data.
In the first year, we've already given journalists more control over the data inside OpenBlock by building a module that more easily allows producers to see when and why data fails to geocode. Producers can then easily correct misspellings or other "dirty data" errors that previously just failed silently without notifying either journalists or the end-users of OpenBlock.
These improvements are already available on our GitHub for you to use, improve or adapt to your needs.
FOR CONTENT SPONSORS: AN ENGAGED AUDIENCE
OpenBlock only works if it generates at least as much annual revenue as it costs to produce, so everything we do keeps in mind the audience of potential content sponsors.
The small, rural newspaper our project targets will never achieve the scale of audience needed to make a profit off of rapidly falling cost-per-impression online advertising. That's why we're working with The Whiteville News-Reporter to develop a sales and pricing strategy that positions OpenBlock as part of a larger cross-platform sponsorship. These sponsorships provide an opportunity to engage with customers regularly online, creatively in print and personally at topic-specific events. Each piece adds value to the others, and works best in communities such as Whiteville where the newspaper already commands industry-leading loyalty.
My colleague Penny Abernathy, the Knight Chair in Digital Media Economics here atUNC, is leading students in a project that will teach local newspaper sales staffs how explain the value of OpenBlock sponsorships to businesses whose success depends on developing a trusted and lasting relationship with rural audiences. This is one of the ways that we've been able to leverage our School's diversity among advertising, public relations and reporting to show students how to take a holistic approach to editorial product development. They've already had success with this approach to high school sports news in Whiteville.
FOR AUDIENCES: RELEVANCE AND USABILITY
The work we did with OpenBlock in 2012 was incredibly resource intensive, and the bottom line is that unless we can make our successes in Whiteville easily replicable elsewhere we will not have a sustainable business that meets the information needs of the rural communities we set out to serve. We must lower the cost of deployment, sales training, and public records acquisition.
The cost of deployment right now is kept high by two factors: the amount of location-based customization the code still requires and the cost of creating a front-end design that matches each client's website.
Developers who specialize in Django and GIS demand some of the highest hourly rates in the industry. The less dependent installation of OpenBlock is on those high-priced skills, the more we will able to focus those resources on new feature development.
We need complete and accurate geographic data for rural communities, which we don't have. We learned something we probably should have known when we started -- that U.S.Census data is missing address ranges on about 60 percent of the road segments in rural areas. It was a good reminder of the reason we wanted to focus on rural communities -- we're amplifying a challenge that doesn't get much discussion at conferences dominated by urban innovators.
We've solved this problem in the short term by obtaining streets data from counties, but this solution isn't sustainable. It's not replicable at all in the 34 of North Carolina's 60 rural counties that don't have road data available for online download. The road data that the state Department of Transportation provides online is several years out of date and is missing major interstates around Raleigh. My informal request for more complete, current roads data was denied. That conversation with the DOT will continue in 2013.
We also need to keep working on the best way for multiple installations of OpenBlock to share a single set of data. For example, we are scraping the Secretary of State's new corporation filings. That site has data for every county, and we need to make it easy for new installations to grab only the data in which their communities are interested. But there are relatively few statewide datasets that are so easy to scrape and keep current.
Scaling the cost of installation also requires us to streamline the design of the site so that each installation isn't a complete template customization. It will be important for us to understand how much this kind of customization is worth to local papers, and how many would be willing to pay for a basic choice of color palettes and text branding.
Our data acquisition process also needs to scale. Right now, very little government data is current, complete, online and granular. Of the data that is online, much of it needs to be scraped out of forms and HTML pages. And when the structure of the forms and results changes, the scrapers must be editing. That drives up the cost of maintenance in an unpredictable way.
We are still looking at how we might crowdsource data acquisition and how we can work with local governments to lower their costs of doing business while also better serving our needs for structured data.
Scaling data also needs to happen on the presentation side. Right now, only data that contains a meaningful time and precise location can be displayed in OpenBlock. That means annual school testing data -- which is in high demand among all three of our key audiences -- doesn't look very good on OpenBlock. It maps the student-teacher ratio to the address of the school, but that data is relevant to a much wider audience than the folks who live on the same block as the school and is relevant for 364 days after it's first published. We are looking at how we can efficiently move away from OpenBlock's tyranny of the map without running up huge development costs and recreating the wheel.
The scaling of data acquisition and presentation is the important civic challenge, but it's our business of bringing together these three key audiences that continues to drive our focus.