Some Perspective on Crowd-sourced Geo-data

The third international State of the Map (SOTM) conference is happening now in Amsterdam NL. This seems like a good time to take stock of the progress made in adapting the open source model of production to the creation and maintenance of geospatial data.

From humble beginnings in August 2004, the OSM project recently passed the 100,000 user milestone, of which a large number of are active contributors. The platform has expanded to include not just a data repository and display, but also a RESTful software API to support a growing community of users that tap into the database.  User support is supplied through an comprehensive and well-maintained wiki that provides tips for new users and guidance for map-making. Although the project was initiated and continues to be based in Britain, contributors now span the world. A graphic example of the world-wide embrace of this project can be seen in this video which depicts all the edits worldwide to OSM during 2008. Another notable measure of success and acceptance of the project is the Obama administration’s embrace of OSM-based maps.

The central criticism leveled against OSM (and for that matter, other crowd-sourced data initiatives) is that experts trained in geographic data collection will always produce data of higher quality and accuracy than untrained amateurs. While that is arguably true, it is also the case that OSM has provided an on-ramp to a broad range of amateurs who take an ardent interest in having an accurate and open repository of geospatial data at their disposal. If you monitor the OSM-Newbies mail list for a few days, you will instantly get a sense of how passionately OSM users are in creating accurate database and the pains they take to represent real world features correctly. Moreover, at least one study shows that OSM data compares favorably with data collected by the Ordnance Survey, the UK’s national mapping agency. So while OSM data will always be subject to the same vaguaries of accuracy and integrity as say, Wikipedia, the same user base that acts to self-correct Wikipedia also exists to continually improve OSM.

It is fair to conclude that the open source model of production transfers well to geospatial data creation. In fact, it may also be fair to say that the barriers to entry may be lower for data creation compared to software creation, where a more specialized skillset is required to contribute. More rigorous academic case studies of OSM and other crowd-sourced data initiatives would likely hit upon the key characteristics necessary to run a successful open source data project and provide insight into the conditions favorable to sustained growth and utility of the data.