A fascinating project has been undertaken by the Partners of the Pan-Canadian Documentary Heritage Network (PCDHN): It’s a proof-of-concept showcase of using Linked Open Data visualizations for “Out of the Trenches.” This is a look at the First World War from the Canadian perspective: war songs, postcards, newspapers, photos, films, and these resources’ intersection with Canadian soldiers who fought in the war.

 

These digital resources from organizations such as McGill University, the Universities of Alberta, Calgary and Saskatchewan, and the Bibliothèque et Archives nationales du Québec have been linked through existing metadata provided in formats ranging from spreadsheets to MODS XML to RDF. Rather than reduce the metadata to a common subset, the approach was to maximize its use by moving to the “web of data” concept, so that the resources can be combined in different and unexpected ways, according to the proof-of-concept final report that was issued on the project.

The premise was to expose the metadata for these resources using RDF XML and existing published ontologies such as the Event Ontology, the Dublin Core Ontology and the Biographical Ontology, elements sets, vocabularies and resources like the Geonames geographical database to maximize discovery by the user community and contribute to the Semantic Web.

The report notes that the visualization elaborates on two dimensions: the Canadian Expeditionary Force (for which an “authority” of biographical information has been created for each soldier) and Events. A video tour that gives an example of the fruits of the work takes viewers through the story of Mike Foxhead, an aboriginal from Gleachen, Alberta. It links together his participation in the war, starting from a newspaper article from the University of Calgary digital archives about men being asked to enlist during a dance at the Blackfoot Preserve in Gleachen. It follows through to a newspaper clipping of Foxhead’s first letter home after being posted overseas, courtesy of the University of Alberta, and then on to pages from the diaries of the 50th Battalion on the day he died, which are sourced from the Library and Archives Canada (LAC).

Along the way, intersections can be explored via facts from his authority, such as his place of birth or death linked via Geonames, and resources from other relationships that are expressed there. For example, it would be possible to gain more insight about the “Siksika” (Blackfoot) Indians of which he was a member.

The report makes the power of Linked Open Data pretty clear:

The “visualization” application provides only a sampling of what is possible: the metadata provides many more things that can be discovered using the ontologies/element sets implemented. For example: songs have composers, lyricists, and performers, some of whom were also CEF soldiers; events occurred in many geographic areas over time. The potential for discovery is limited only by the imagination of the user of the metadata: “data statements” (or facts) can be added to the “web of data” by anyone, and new “stories”/”visualizations” can be developed by combining these facts with the PCDHN “proof-of-concept” knowledge base.

The report also concludes that the project team’s foremost lesson from creating the proof-of-concept is that, “RDF and LOD are an elegant approach for integrating resource discovery across different domains, institutions, and services.” Once the PCDHN metadata was transformed to RDF, it reports, the integration of the resources from very different collections and source metadata schemes is entirely transparent, with no need to make any explicit relationships between resources

At the same time, it does urge others embarking on LOD projects to have care in selecting target ontologies and element sets. That includes considering what ongoing support there is for these resources and their stability in terms of providing backward/forward compatibility for evolving ontologies and element sets.

There isn’t yet a production-ready application for the proof-of-concept, which currently supports only Firefox browsers. It was built with the Mulgara triple-store, an open source scalable RDF database written in Java, for storing the data statements for the resources and providing a SPARQL endpoint for querying the data store (which is hosted by the University of Alberta and locally), and uses Javascript in a browser for querying the triple-store and the actual “visualization” of the results of the query.

Plans are next to determine the order of taking things beyond the proof of concept, including developing visualizations for the remaining dimensions; deliver cross-browser functionality; add more intelligence in query results parsing; make it possible to retrieve and filter by multiple dimensions; create geographic and time line visualizations of events and their related resources; and add additional resources and metadata from other interested parties, among other efforts.

The “proof-of-concept” metadata is available here under an “Open Data Commons PDDL” license.