Enabling Data Independence for Government Transparency

by Ralph Hodgson, CTO, TopQuadrant, Inc.

Open Government has become a popular theme, both in the U.S. and other countries.  With “Transparency” gaining momentum, increasing categories and amounts of government data are becoming available on the web.  In the U.S., an impetus for this was Barrack Obama’s memorandum to the heads of Executive Departments and Agencies. This included the following statement:

“… Government should be transparent.  Transparency promotes accountability and provides information for citizens about what their Government is doing.  Information maintained by the Federal Government is a national asset. My Administration will take appropriate action, consistent with law and policy, to disclose information rapidly in forms that the public can readily find and use. Executive departments and agencies should harness new technologies to put information about their operations and decisions online and readily available to the public. Executive departments and agencies should also solicit public feedback to identify information of greatest use to the public.”

Since that memorandum, in the U.S., a number of government and non-government initiatives have occurred. Of these the following are worthy of mention as illustrative of what is currently happening in the U.S. Open Government space, and as a motivation for the oeGOV initiative:

  1. The National Dialogue, at http://www.thenationaldialogue.org/, was a government initiative for collaboration on ideas for Open Government. oeGOV was born out of two ideas: "Use small OWL ontologies to model recovery and deploy across all government" and Tim Berners-Lee’s vision of "Linked Open Data";
  2. Data.gov, at http://www.data.gov/, is a government portal serving as a catalog of published government data;
  3. Apps for Democracy, at http://www.appsfordemocracy.org/, is a District of Columbia initiative to make DC.gov’s Data Catalog useful for the citizens, visitors, businesses and government agencies of Washington, DC;
  4. Apps for America, at http://www.sunlightlabs.com/contests/appsforamerica/ ,is a SunlightLabs initiative inviting developers to build demonstrators of what is possible to make government data accessible, interpretable and actionable. Some projects at SunlightLabs Apps for Democracy use RDF – for example, ThisWeKnow;
  5. GOV Data to RDF, at http://data-gov.tw.rpi.edu/wiki/Main_Page, is a project at  Rensselaer Polytechnic Institute (RPI) that is part of the Tetherless World Constellation. On July 16, 2009 , the project announced that if had translated 16 data.gov datasets to RDF contributing almost 3 billion triples to its triplestore(s);
  6. Increasing Web sites devoted to providing insights into the political process and reporting, for example:  AnalyzerThe.us, Citability.org, FairSpin.org, FollowTheMoney.org, Governing.com, GovStats.org, Recovery.gov, and Transparent-gov.com;
  7. Transparency Camps, whose last West Coast camp was held at the Google Campus, on the web at http://transparencycamp.org/, are about: “convening a trans-partisan tribe of open government advocates from all walks — government representatives, technologists, developers, NGOs, wonks and activists — to share knowledge on how to use new technologies to make our government transparent and meaningfully accessible to the public.”

For some, transparency means making government data available on-line for browsing and searching.  For others, it means using the web to make government “accountable.”  Within this later group, accountability can be as specific as how elected members of the Senate and Congress conduct their activities in the political process towards outcomes that can be correlated to policy and election promises.

Such goals typically require connecting and correlating data from different government sources and doing so on a scale that can only really work with automation. For example, it may be important to look at politicians’ voting record on the environmental issues in the context of the industries and factories present in their districts, the number of pollutant spill accidents and other relevant data. As more raw data becomes available, analysis of this sort is moving from the realm of a labor intensive research project to an easily accessible query against the linked data.
Placing increasing amounts of raw data on the Web is a good first step towards government transparency. But for it to be truly useful it needs to be connectable. Since data coming from different sources is idiosyncratic, connecting across data sets today requires heroic efforts from brigades of programmers. To truly support the transparency goals, government data needs to be Findable, Interpretable, Decidable and Actionable, in short FIDA-friendly.

The challenges that confront us when we deal with data have been well reported, and can be summarized as:

  1. Data Accessibility. Can we connect to the data? When we do connect to it, is it a format that we can process? When we do process it, is the data complete?
  2. Data Quality. Have names and identifiers being consistently used? Are strings that denote what should be controlled entities used consistently? Are data values always expressed with the same data types and units of measure?
  3. Data Compatibility.  Data has a data type and is often dimensioned  – that is numbers have formats and units of measures. Do we know that data from different sources can be aggregated, correlated and consolidated?
  4. Referential Integrity. Is the data about the same thing? Is the data from the same Government Body? Are the locations that the data refer to the same?
  5. Data Provenance. What is the source of the data? Can the data be trusted? Who has accredited the data?

“Freeing the data,” by publishing more and more diversely formatted data on the web, does not give us the “Data Independence” that is needed. To move beyond information overload, we have to “think data-based and not data-bases.” This we achieve by having data typed, linkable, composable and inferenceable through RDF and OWL.

oeGOV, (http://www.oegov.org), is an initiative started by TopQuadrant for establishing foundation ontologies for data source navigation, data aggregation, data transformation and sense-making.  Ontologies for eGovernment enable:

  • Distributed creation and maintenance of information on data, about where it is used and the government data itself
  • Standardization of neutral models for data exchange and transformation
  • Aggregation of data through the use of RDF/OWL formats
  • Interpretation of data through precise semantics and controlled vocabularies, including geospatial and temporal aspects
  • Navigation over who is publishing what in what format
  • Provenance and trust in the sources of data
  • Correlations and comparisons of data
  • Understanding of how the political process and policy making align with outcomes
  • Government accountability for efficiencies and effectiveness
  • Citizen awareness and appreciation of government initiatives

As far back as 2003, TopQuadrant has been building ontologies for eGovernment using W3C standard languages RDF/S and OWL. The first eGovernment ontologies were the Federal Enterprise Architecture (FEA) Ontologies. At that time we needed an ontology of government bodies in order to build what we called a “Capability Manager.” This was a system, based on Semantic Web Technologies that could advise different stake-holders on the capabilities that were being provided and developed to support the FEA. We envisioned a system accessible through WEB Services that would allow agencies, other governments, businesses, and citizens to make queries about the FEA model, to find capabilities that support agency services and to assess compliance of their agency business models and architectures with the FEA.

In 2003, there was no comprehensive and trusted source for the organizational structure of the U.S. Government.  Today, as far as we know, this is still the case. USA.gov provides a directory of government bodies, at http://www.usa.gov/Agencies/Federal/All_Agencies/index.shtml, but there is still not a machine processable version that defines the URIs of all Government bodies. Hence the motivation for oeGOV.

While currently the focus in oeGOV is on ontologies of Government, datasets of U.S. Government branches, agencies, departments, offices and state governments, the intention is to go beyond this with ontologies that:

  • Enable navigation over data.gov
  • Connect agency services to the FEA
  • Facilitate data merging
  • Enable consolidation of data for reports and visualization
  • Provide insight into transparency

oeGOV has already published a number of ontologies. In the spirit of incremental releases, the first set was published at the oeGOV blog site  www.oegov.us/blog on August 1, 2009, the date that celebrates “Swiss Independence”, and deserving to be called – "Data Independence Day".

At one level, the oeGOV ontologies can also be thought of as controlled vocabularies in RDF/OWL, establishing the URIs for every Government Body, such as usgov:DHS, usgov:DOC, usgov:DOJ, usgov:DOT, and usgov:EPA. Each Government Body is related through a model of Government structure and their reason for being can be correlated to Government Statutes. Building on this foundation, oeGOV ontologies are being used to provide an OWL schema for who is publishing what data and where that data can be found.

The oeGOV ontologies are being built in TopQuadrant’s TopBraid Suite. An example of some foundation concepts centered around ‘gov:Body’ are shown in the figure below:

TQ Gov Body

TQ Gov Body

The diagram illustrates how data in different formats are associated, through publication events, to a Government Body. Over 500 government bodies are in the current release of the ontologies, which are catalogued at http://www.oegov.us/blog/?page_id=13 /. For example, the N3 graph of usgov:DHS is at http://www.oegov.us/democracy/national/models/owl/us1gov_dhs.n3

Building oeGOV is a huge effort, and we invite all interested parties to participate. We are particularly keen to have participation from U.S. Government Agencies, who we feel should own this work. To facilitate participation from different organizations and groups, the ontologies have been architected in a highly modular way.

If you are interesting in participating in oeGOV please send an email to rhodgson@topquadrant.com.