Last week, we covered the story of how Chris Testa, Director of Engineering at, Inc. brought the Semantic Web to Hollywood. Today, in Part II, Chris shares his recommended 5-Step process for Linked Data Integration.

1. Understand what your “things” are

  • Look for the high value entities in your system — the ones bringing money and business intelligence over competitors (Examples: Advertisers, Brands, Celebrities)
  • Look for models that are growing quickly in your system (For us, it was Celebrities)
  • Look for things that are well annotated, popular things in culture & technology

2. Choose a Linked Dataset:

  • dbpedia and Freebase are cornerstones of the Linked Data movement
  • There are tons of specialized datasets in many fields (biomedical, events, news, gov’t, so much more!)
  • Once you link up, linking to more becomes much easier!

3. Reconcile your things:

  • Reconciling is matching the entities in your database with remote linked data sources
  • Freebase’s matchmaker is a really useful tool for reconciling
  • Make it a game, put experts on it to ensure high quality datasets
  • Heuristic methods exist to tackle queues in the 100k+ count

4. Build business intelligence:

  • Tip: There are really simple things you can do with linked data that are cool!
  • For example, display context to users around reconciled entities in your project. Context makes things easier for users.
  • Index and search on reconciled properties like full name, gender, genre, profession, etc.

5. Feedback & maintenance

  • Users won’t trust the data unless it is manicured.
  • Add lots of negative feedback loops (Unlike buttons!) to make sure that users are heard.
  • A few minutes a day of cleanup does wonders!

Chris Testa, Ad.lySee Chris’ SemTech 2011 presentation on slideshare: How Hollywood Learned to Love the Semantic Web:

Additional Reporting by Jennifer Zaino with contributions from Chris Testa, Director, Engineering, Adly, Inc.