With the abatement of the media buzz surrounding open data since the first International Open Government Data Conference (IOGDC) was held in November 2011, it would be easy to believe that the task of opening up government data for public consumption is a fait accompli.  Most of the discussion at this year’s IOGDC conference, held July 10-12, centered on the advantages and roadblocks to creating an open data ecosystem within government, and the need to establish the right mix of policies to promote a culture of openness and sharing both within and between government agencies and externally with journalists, civil society, and the public at large.   According to these metrics the open government data movement has much to celebrate:  1,022,787 datasets from 192 catalogs in 24 languages representing 43 countries and international organizations.

The looming questions about the utility of open government data make it clear, however, that the movement is still in its early stages.    Much remains to be done to to provide usable, reliable, machine-readable and valuable government data to the public.

One of the big questions addressed during the conference was what, exactly, the role of government should be in providing data to the public.  Should open government data portals serve only as catalogs of data in a variety of formats, which then rely on external developers, civil society groups and journalists to “mash-up” and interpret?  This was the opinion of Rufus Pollock of the Open Knowledge Foundation, the nonprofit organization that developed the open-source CKAN data management system used by governments and communities in the United Kingdom, Austria, Brazil, the European Union, and many other countries and municipalities.  Data re-use in his opinion is the most important indicator of the success of any open government data initiative– it matters little how much credit the government receives for providing it, and in what context it is presented.

Other presenters, such as Mexico’s Digital Government Unit Head Carlos Viniegra, suggested that, where the capacities of developers, civil society groups and journalists are limited with regard to interpreting data, these groups have put pressure on the government to take on a more active role in helping citizens perform this function.  Some government agencies, local governments and multilateral agencies are moving in this direction through app challenges, capacity building programs, and by creating more user-friendly data sites that provide tools to help users visualize data.

Kevin Merritt of Socrata, the open data platform used by Data.gov, the United Nations, and the city of Chicago noted that, with the the proposal for a ‘Datasets‘ addition to schema.org, search engines such as Google and Bing will soon likely serve as capable catalogs of government data.   Given this scenario, there was a discussion about whether the next stage of government data sites should be more focused on helping specific communities search for and filter needed government data, such as the US Department of Health and Human Services’ HealthData.Gov website.

Many presenters at the conference noted the poor quality of the data being made available on government sites, citing the lack of machine readable formats.  Even where datasets are available in machine readable formats, observers noted that very little context and metadata about datasets are available, and errors and inconsistencies are common.  John Wonderlich, Policy Director of the Sunlight Foundation suggested that at best, government data quality can be considered “a mixed bag”

With these deficiencies in data quality and standards, it is not surprising that there was very little public discussion about linked data at the second IOGDC.  Digging further, the following reasons were given to explain why linked open data was not major topic of this conference:

  • Linked open data is a relatively new concept, and a lot of government tooling (at least in the US) is not compatible,
  • Agencies, and the contractors who work with governments would rather not make the expenditures necessarily to publish and consume linked data,
  • Many developers (even in the UK) don’t like linked data because it requires re-training, and
  • The US government is facing the considerable challenge of establishing ontologies and determining persistent URIs.

Many conference presenters and attendees held the opinion that the tools and standards for creating and using linked open data could use some modification to simplify their use by the open government data community.   Erik Mannens of the Belgium’s IBBT-MMLab had a unique and straightforward suggestion for the development of a tool that could help governments publish open linked data using familiar file formats (such as CSV), and that would help developers use this data:

“[For] the people who are accustomed to CSV–let them upload their CSV, but give them a tool within their CSV where they know the column names–so in a very simple way let them just drag and drop column names to known linked open data principles, even if they don’t know about [them], and then automatically publish the CSV files with higher semantics.

And on the other side, you have a framework that takes this linked open data and just give the developers an API where they can apply a REST to Read, Write, Put, and Link, that’s also enough for them, so they don’t have to know the nitty gritty details about linked open data as well.”

That said, the open government data community is growing and is quickly coming to realize the need for linked open data.  In the words of IOGDC Organizer and Data.Gov Evangelist Jeanne Holm, open data is very much “just past infancy- so think of us as toddlers perhaps- and maybe we aren’t going to fully able to embrace all of the capabilities of linked data for a while yet, but you can see the beginnings of that.”

Overall, wherever linked open data was mentioned, most people agreed that it will become more of a standard as the open data community grows and evolves.   During the closing panel discussion Challenges for the Future, Rensselaer Polytechnic Institute’s Jim Hendler pointed out that as the number of government datasets available on the internet continues to grow exponentially, so will the need to integrate many, many small data sets, and “the limits of traditional database approaches will soon be met.”

Kristen Milhollin is a writer, mother, champion of good causes, and semantic web enthusiast.