Library of Congress Embraces Linked Data Movement

Jennifer Zaino
SemanticWeb.com Contributor

Is Washington D.C. waking up to the semantic web? The evidence is growing that maybe it is.

A visit to a page on the Library of Congress site, updated at the end of last month, reveals plans for services that implement “the Linked Data movement’s approach of exposing and interconnecting data on the Web via dereferenceable URIs. We aim to make resources available on this site within 6-8 weeks.”

According to the site, the first resources to be released will be the Library of Congress Subject Headings (LCSH), which it says “is an almost verbatim re-release of the system and content once found at the popular prototype lcsh.info service. The primary exception will be that the URIs for the data values will no longer take the form http://lcsh.info/{identifier}. Instead, they will start with http://id.loc.gov/authorities/{identifier}.” Subject headings are a type of controlled vocabulary that is used to take the guesswork out of searching by using a single term to describe a subject.

Work has been underway for awhile: In a June 2008 report by Deanna B. Marcum, associate librarian for Library Services, she writes that a current action to position the Library of Congress’ Technology for the Future includes expressing library standards in machine-readable and machine-actionable formats, in particular those developed for use on the Web.

One of the current actions noted in the report has been the Library’s experiment to make LCSH available in ResourceDescription Framework (RDF) using SKOS (Simple Knowledge Organization System), a family of languages expressly developed to represent tools such as thesauri, classification schemes, subject heading systems, and taxonomies for the Semantic Web.

Next up, according to the site, will be the release of content that includes various MARC (Machine-Readable Cataloging) formats, including
MARC Geographic Area Codes, MARC Language Codes, and MARC Relator Codes.

Benefits the Library of Congress expects to see from these efforts include the ability to establish provenance when minted data values maintained by the Library in the domain of loc.gov are used in Linked Data or the Semantic Web, and enable the Library to serve as a conveyor for other organizations on how to convert and share their data as Linked Data.

It expects that users, both human and machine, will benefit from having granular access to individual data values at no cost; the ability to link to Library of Congress data values within their own metadata, and a simpler interface for requesting resources over HTTP.

This announcement was made at about the same time that it was also reported that the Whitehouse.gov is using RDFa as well. That probably shouldn’t come as a surprise, given the video posted on the Change.gov website highlighting the Obama Administration’s TIGR (Technology, Innovation, and Government Reform) team, where team members drawn from the worlds of government and business state flat-out that the federal government is way behind in terms of dissemination of information and providing transparency to the public.


In the video, Andrew McLaughlin, head of global public policy and government affairs for Google, notes that one of the most obvious ways to make the government more open is to take the data taxpayers have paid for, get it on the web so that people can download it, and mash it up in ways that will help citizens understand their world better and even drive economic activity.

Says Vivek Kundra, chief technology officer for the district of Columbia, “Why is it that we can’t innovate and find better ways of bringing services, lowering the cost of government operations, and driving transparency, and those are the kinds of things you are going to see in this administration.”

And on top of that, Hillary Clinton’s already got a FOAF record. There’s no telling what we’ll see next — maybe even more direct investment by Washington into research on the potential of the semantic web.

This Library of Congress site serves as a placeholder for forthcoming web services that will enable both humans and machines to programmatically access authority data at the Library of Congress. The initial services offered are influenced by — and therefore implement — the Linked Data movement’s approach of exposing and inter-connecting data on the Web via dereferenceable URIs. They aim to make resources available on this site within 6-8 weeks.

Expanding Access

For those interested in the nitty-gritty details, here’s an example of how it will work: Users and machines will request the URI of interest over HTTP. For example, to access the data value “World Wide Web” in the Library of Congress Subject Headings, one would request this URI: http://id.loc.gov/authorities/sh95000541#concept.

When requesting this URI, users will have mechanisms for specifying how they want to serialize the data they wish to access. The benefits of this system include access to data at no cost; the ability to download entire controlled vocabularies and the values within them in numerous formats, and the ability to link to Library of Congress data values within your metadata.

Semantic Tech & Business Conference Returns to San Francisco

Semantic Tech & Business Conference returns to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. Sign up now!