Last week, we covered the story of how Chris Testa, Director of Engineering at Ad.ly, Inc. brought the Semantic Web to Hollywood. Today, in Part II, Chris shares his recommended 5-Step process for Linked Data Integration.
1. Understand what your “things” are
Look for the high value entities in your system — the ones bringing money and business intelligence over competitors (Examples: Advertisers, Brands, Celebrities)
Look for models that are growing quickly in your system (For us, it was Celebrities)
Look for things that are well annotated, popular things in culture & technology
2. Choose a Linked Dataset:
dbpedia and Freebase are cornerstones of the Linked Data movement
There are tons of specialized datasets in many fields (biomedical, events, news, gov’t, so much more!)
Once you link up, linking to more becomes much easier!
3. Reconcile your things:
Reconciling is matching the entities in your database with remote linked data sources
Freebase’s matchmaker is a really useful tool for reconciling
Make it a game, put experts on it to ensure high quality datasets
Heuristic methods exist to tackle queues in the 100k+ count
4. Build business intelligence:
Tip: There are really simple things you can do with linked data that are cool!
For example, display context to users around reconciled entities in your project. Context makes things easier for users.
Index and search on reconciled properties like full name, gender, genre, profession, etc.
5. Feedback & maintenance
Users won’t trust the data unless it is manicured.
Add lots of negative feedback loops (Unlike buttons!) to make sure that users are heard.
A few minutes a day of cleanup does wonders!
See Chris’ SemTech 2011 presentation on slideshare: How Hollywood Learned to Love the Semantic Web: http://slidesha.re/mhXXOJ
Additional Reporting by Jennifer Zaino with contributions from Chris Testa, Director, Engineering, Adly, Inc.