Nine Things You Need to Know about Open Data
Saturday, February 22 was Open Data Day, a global community initiative to make and spread open data. To celebrate, Executive Editor Adena Schutzberg shares nine things those who work with geospatial data need to know about open data.
Wikipedia defines open data as “the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.”
Open data is most often not text, but rather maps, geodata, genomes, compounds and the like. In their raw form these data have potential, but citizens, developers, scientists and the curious are needed to give them meaning and value.
Open Via License
Data can be hidden away and remain unused when data owners implement access restrictions, licenses, copyrights, patents and charges for access or re-use. Today, data are made open by the assignment of an open license. These might be newly captured LiDAR datasets that have never had a license or datasets that once had a restrictive license (or were never shared) that were updated to an open license. Zillow, for example, explicitly offered its neighborhood boundaries beginning in 2008 under a Creative Commons license.
Without an explicit license that prevents it, datasets can be updated, combined, put under copyright and sold.
Open Data Licenses
Just as there are many open source software licenses, there are many open data licenses. For example, if you visit the Glasgow open data site you can search for data under one of eight licenses!
- UK Open Government
- Glasgow Open Government
- OS OpenData License
- GCC OS License
- Open Data Commons Open Database License
- License Not Specified
- Creative Commons NC
- Open Data Commons Attribution
Unlike open source, where an organization, tThe Open Source Initiative keeps a list of licenses that follow open source principles, and there is no a counterpart for open data licenses: Open Definition. However, r Researchers at GovLab seem to be exploring a parallel idea.
Open Government Data and Open Commercial Data
The two key reasons open data advocates use to promote open government data focus on (1) transparency and accountability, and (2) the ability of third parties to leverage the data via the development of applications and services that address public and private needs.
Organizations besides governments, such as private companies, can and do provide open data. Unlike government data, however, the data are intellectual property, typically the result of private investment, and at least initially may be protected with copyright. Still, as with open government data, the public benefits more when third parties explore and build on private data than when just the owner has access. Lending Club, a peer-to-peer lender with funding from Google, provides current and historic datasets including city and state of lenders/borrowers.
Copyright and Open Data
In the United States, federal data are not copyrighted. Rules vary among the 50 states. Government agencies in California, Florida, Massachusetts and others do not claim copyright to the data they create. In other states, New York for example, data can be copyrighted.
To make things even more complex, the County of Santa Clara v. California First Amendment Coalition decision suggests that when a California state agency uses copyrighted data (e.g. from a commercial data vendor), those data become public, but the public is still bound by the original copyright. Said another way, the public may view and analyze the data in the context of transparency and accountability, but use for other purposes without permission would be prohibited.
Open Data Formats
What a user can do with open data is specified by the license. The nuts and bolts of using the data depend on the format in which data are delivered. Geospatial data can be made available in an open format, one that’s published for all to use and build apps against, such as KML. Or, they might be delivered in a proprietary format, one that’s not published for all to use, such as Esri’s geodatabase.
Open Data Policy
Local, state, regional, federal and private entities can develop and implement open data policies. Policy statements typically include details on:
- which data are to be made open and which are not (for privacy, security or other reasons)
- when past and future data are to be available
- where the data are available, typically a URL on the Web
- in what format the data are to be available in bulk and via an API
- under what license the data are to be available
- cost (if any) for reproduction or use of API, etc.
While it’s possible to measure the success of an open data initiative by how many datasets are available, there is a consensus that the true measure is if and how the data are used. Does the government use the data it has made available? Do software developers? Do citizens? In that sense, an open data initiative is like a library: the value is not in the number of books on the shelves, but how many patrons actually read them.
Special thanks to Bruce Joffe of GIS Consultants for reviewing this article.
More Sources and Resources: