The team behind the data integration tool Karma this week presented at LODLAM (Linked Open Data in Libraries, Archives & Museums), illustrating how to map museum data to the Europeana Data Model (EDM) or CIDOC CRM (Conceptual Reference Model). This came on the heels of its earning the best-in-use paper award at ESWC2013 for its publication about connecting Smithsonian American Art Museum (SAAM) data to the LOD cloud.

The work of Craig KnoblockPedro SzekelyJose Luis AmbiteShubham GuptaMaria MusleaMohsen Taheriyan, and Bo Wu at the Information Sciences InstituteUniversity of Southern California, Karma lets users integrate data from a variety of data sources (hierarchical and dynamic ones too) — databases, spreadsheets, delimited text files, XML, JSON, KML and Web APIs — by modeling it according to an ontology of their choice. A graphical user interface automates much of the process. Once the model is complete, users can publish the integrated data as RDF or store it in a database.

The Smithsonian project builds on the group’s work on Karma for mapping structured sources to RDF. For the Smithsonian project (whose announcement we covered here), Karma converted more than 40,000 of the museum’s holdings, stored in more than 100 tables in a SQL Server Database, to LOD, leveraging EDM, the metamodel used in the Europeana project to represent data from Europe’s cultural heritage institutions.

The museum stores collection metadata in a relational database managed by TMS, a data management system for museums. Once modeled in the EDM ontology, the information was converted into 5-star Linked Data, linked to DBpedia (which provides a gateway to other linked data resources), the Getty Union List of Artist Names, and the NY Times Linked Data. The result: Today, each time users visit an artist page in the Smithsonian American Art Museum (SAAM), a SPARQL query is issued to retrieve links to Wikipedia and the NY Times.

As the Karma team notes, the SAAM project had a larger goal beyond mapping the SAAM data to Linked Open Data. It wants to develop the tools that will enable any museum — or other organization, for that matter — to map their data to linked data. “Museums around the world have built databases with metadata about millions of objects, their history, the people who created them, and the entities they represent,” the authors explain in the paper. “This data is stored in proprietary databases and is not readily available for use. Recently, museums embraced the Semantic Web as a means to make this data available to the world, but the experience so far shows that publishing museum data to the linked data cloud is difficult.”

The Artist’s Way

The effort involved extending EDM with subclasses and subproperties to represent attributes unique to SAAM, and to maximize compatibility with a large number of existing museum LOD datasets. That was one of the biggest challenges faced, the authors report, and they point out the need for a library of ontologies for cultural heritage. Mapping SAAM data to RDF was another, on the data preparation front (where they had to filter and transform data prior to modeling it and converting it to RDF) and when it came to mapping columns to classes. “Karma,” they note, “addresses this problem by learning the assignment of semantic types to columns.”

The payoff was worth the work, though. The authors state:

“…the linked data provides access to information that was not previously available. The Museum currently has 1,123 artist biographies that it makes available on its website; through the linked data, we identified 2,807 links to people records in DBpedia, which SAAM personnel verified. The Smithsonian can now link to the corresponding Wikipedia biographies, increasing the biographies they offer by 60%. Via the links to DBpedia, they now have links to the New York Times, which includes obituaries, exhibition and publication reviews, auction results, and more. They can embed this additional rich information into their records, including 1,759 Getty ULAN identifiers, to benefit their scholarly and public constituents.”

In the works, the paper says, is exploring other ways to use the Linked Data to create compelling museum apps. For example, that might include, they write, “a relationship finder application that allows a museum to develop curated experiences, linking artworks and other concepts to present a guided story.”