SemTechBiz SF SemTechBiz UK SemTechBiz NYC more TVNewser TVSpy GalleyCat AppNewser UnBeige AgencySpy PRNewser 10,000 Words FishbowlNY FishbowlLA FishbowlDC MediaJobsDaily SocialTimes AllFacebook AllTwitter

Q&A Session for “Introduction to Linked Data” webcast

Q: Can you go back to more clearly describe RDF?

A: You might have a look at this short introduction to RDF (note the date!) written by my business partner, Uche Ogbuji:
http://www.ibm.com/developerworks/library/w-rdf/

A longer and more complete introduction is available here:
http://research.talis.com/2005/rdf-intro/

Q: Are Apple apps already using SPARQL and LOD at consumer level, or are they using different method?

A: Apple apps use many of the W3C standards, but as far as I know they are not using SPARQL or LOD techniques. The big announcements around RDF usage by major consumer-oriented companies in the past year have been Google and Yahoo’s support for parsing RDFa from Web pages and Best Buy’s use of RDFa to increase their page ranking on those search engines.

Q: How does SPARQL fit with RDF?

A: SPARQL is a query language for distributed RDF data in the same way that SQL is a query language for relational databases.

If you want to see how to create SPARQL queries for real, try these:
http://www.cambridgesemantics.com/2008/09/sparql-by-example/
http://www.slideshare.net/ldodds/sparql-tutorial

Q: Can you also elaborate more on how LOD can overcome scalability issues?

A: Linked Data approaches consist of standards, formats, tools and techniques to query, resolve and analyze distributed data on the World Wide Web. Linked Data is based completely on the standards of the Web. The Web is the largest and most complex information system ever fielded because of scalability principles built into those standards. Roy Fielding, in his doctoral dissertation, captured and analyzed the properties that make the Web scalable. He called the collection of those properties Representational State Transfer (REST). Linked Data is built on REST. Roy’s dissertation is quite readable for a thesis and may be found at:
Fielding, R.T. (2000). Architectural Styles and the Design of Network-based Software Architectures. Doctoral dissertation, University of California, Irvine. http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm

Q: When do you have to use absolute URI’s in RDF?

A: There are several different ways to store RDF information. There are at least five commonly-used document formats (RDF/XML, OWL/XML, Turtle, N3, RDFa or raw triples – I prefer Turtle), the SPARQL Query Results XML Format and the SPARQL query language itself. Most RDF systems exchange information using one or more of those formats. The syntax of the particular format dictates whether URIs must be absolute or whether they may be simplified. Most (e.g. RDF/XML, OWL/XML, RDFa, Turtle, N3, SPARQL result set and query language) allow some mechanism to shorten URIs (variously called "namespaces" or "compact URIs").

Q: Data inflation – great slide!

A: Thanks!

Q: What is your understanding of the relationsip between the semantic web and the human mind (e.g., what practices promote learning, development of expert-like knolwedge architecture, and generative thinking?)

A: There may be a good reason to use an associative model to describe information: Human memories have been claimed to be associative in nature [Collins 1975], and recent functional magnetic resonance imaging studies lends credence to that view [Mitchell 2008]. See the following for details if you don’t already know them:

Collins, A.M. and Loftus, E.F. (1975, November). A Spreading-Activation Theory of Semantic Processing. Psychological Review, 82, pp. 407-428.

Mitchell, T.M., Shinkareva, S.V., Carlson, A., Chang,K.-M., Malave, V.L., Mason, R.A. and Just, M.A. (2008, May 30). Predicting Human Brain Activity Associated with the Meanings of Nouns, Science, 320 (5880), pp. 1191-1195.

Q: I’m intrigued by the enabling of discovery; can you point to an application demonstrating it in a LOD project?

A: Sure. Try these:
http://www.zotero.org/
http://simile.mit.edu/wiki/Piggy_Bank

For the more technically minded, see the W3C’s expose on The Self-Describing Web: http://www.w3.org/2001/tag/doc/selfDescribingDocuments

Q: Seems like most of the project on Linked Data are academic/non profit projects. Why are there not more commercial projects on Linked Data?

A: There are many commercial projects using Linked Data, they just tend to be more circumspect. Some notable exceptions are Google and Yahoo’s support for parsing RDFa from Web pages and Best Buy’s use of RDFa to increase their page ranking on those search engines. The New York Times was a welcome addition. The BBC both provides data and uses it internally. The forthcoming book I mentioned from Springer (Linking Enterprise Data) will have more.

It is worth noting that the Linked Open Data (with a focus on "open") does not appeal to most businesses. That doesn’t mean that many businesses aren’t exploring or actively using Linked Data techniques.

Q: Just a comment– How we understand these relationsips inside our heads is referred to as structural knowledge. This is also the underlying idea behind concept maps.

A: Right. The same psychological research is behind RDF.

Q: What new security concerns do you see appearing as the web suppports more semantic data/queries?

A: There are several significant challenges for information security. Some of them are:

  • Changing DNS domain holders. If you query some RDF at a given DNS domain for a long period of time, how do know whether the DNS domain changes hands? You might one day be querying a different resource controlled by someone else.
  • International Resource Identifiers (IRIs). Intended as the internationalized replacement for URIs (so, e.g., Chinese people could have Web addresses in Chinese), IRIs are a boon to black hats (which has slowed their adoption). Consider clicking a link that reads walmart.com, but might go elsewhere because the character set of the address *looks* like US ASCII, but is really in another alphabet.
  • URI curation. Systems like PURLs (see http://purlz.org) allow management of the URI resolution process. This can be a benefit for security, or a detriment, depending on who does the curation.
  • Lack of technical trust mechanisms. We solve almost every issue of trust on the Web today socially. If Linked Data clients follow links (semi)automatically, how will they know when they are beyond trust boundaries?

Q: recommendations for best way to start?

A: I recently surveyed a good number of big companies who successfully fielded Linked Data solutions into production (I notice you are from a big company). Successful organizations had at least three things in common:

  • They all had at least one Semantic Web expert on staff.
  • They all worked on at least one real business problem (not a prototype or proof-of-concept).
  • They all leveraged existing investments, especially those with implied semantics (such as wikis or email).

Q: Are there performance optimizations available when working with RDF data? Can a PB of RDF data be queried in real time?

A: Absolutely yes. Querying an RDF database is going to be much faster than querying a bunch of RDF documents in XML format. Some Open Source RDF databases to look at include:
http://mulgara.org/
http://www.openrdf.org/

If you want to pay, try these:
http://www.talis.com/platform/
http://topquadrant.com/products/TB_Suite.html
http://www.openlinksw.com/

Q: The mixing to which you just referred seems to imply "trusted" sources. Could you discuss?

A: Sure. Just like with anything on the Web, the best kind of trust is socially-developed trust. We think we can trust Google to deliver objective search results. We think we can trust Amazon to give us real used-book prices. Similarly, we think we can trust major stores, publishers and governments to describe their own data well. We may be less sure of sites we don’t know. Some of the sites on the Linked Open Data cloud are very trustworthy (such as the scientific ones or the New York Times) and others less, perhaps due to their underlying data sets (such as DBPedia’s scrapping of Wikipedia). You can trust Wikipedia for some things (such as "hard" information like the periodic table or the location of Massawa) but not so much in regards to contentious subjects like climate change or political figures.

When you write SPARQL queries, you have to name your data sources. You therefore get to choose who you trust and for what.

Q: Anything on how to reuse vocabularies? The alphabet soup makes finding the right schema or OWL ontology just as bad as finding a webpage was in the early days of the web…

A: Yes, it does. There have been several attempts to make sense of the soup by allowing people to look up terms and vocabularies, but none have become dominant yet. A summary of the state of play (a bit dated) is at:
http://www.semanticfocus.com/blog/entry/title/semantic-web-search-engine-roundup/

Some ones to look at are:
http://www.sindice.com/
http://swoogle.umbc.edu/

Q: If we use the web as a database, are there any tools that map the schema, attributes, and attribute properties of the linked data?

A: Good question! Schemas on the Semantic Web are composed of the predicates (the URI-addressable terms linking two things) and additional information describing those predicates. When one creates a SPARQL query, one explicitly lists the Web addresses to data sources to query (because you couldn’t practically query the entire WebÉ). Putting those two statements together, it is possible to query your identified data sources for just the predicates they contain, and then the information about those predicates. That would give you the schema, attributes and attribute properties for those data sources. So, the tool you need is simply a SPARQL endpoint that will accept the SPARQL query you need to write.

Q: Relate at what level of granulairty? Page to page or idea unit to idea unit?

A: Both. Neither. It depends :) I suggested during the webinar that one not try to solve all problems from a top-down perspective. Instead, publishing just the data (and just the relationships) that one needs to solve a particular problem seems to work best, especially in larger teams of people (building top-down consensus can take a long time!).

In your particular case with Hylighter (if that was your question), you might consider objects like people, comments, documents and times so you could perform queries like "show me comments made by Peter on document x between 2:00 and 4:00". Capturing subjects or topics would be harder in your free-form environment, but some people use server-side entity extractors to try things like that. They sometimes work.

Q: Would you recommend a specific RDF, etc. authoring tool (WYSIWYG or otherwise) or is a good old text editor (along with heavy dose of "copy and paste" from existing RDF docs) still the best way to go?

A: I used to joke that programmers in my company could use any IDE they chose: vi or emacs. Text editors work just fine. You may have heard Eric Franzon say that he used Dreamweaver to add RDFa to the Web site he developed. I’ve seen demos of TopQuadrant’s TopBraid Composer (http://topquadrant.com/products/TB_Suite.html), which seems nice if you like a graphical environment. For ontology development, some people prefer Protege (http://protege.stanford.edu/), but I like SWOOP (http://code.google.com/p/swoop/) for its better debugging capabilities. The Eclipse IDE and the Oxygen XML editor also have some support. It really depends which of the many possible jobs you are trying to accomplish and the kind of environment you feel most comfortable in.

NOTE from Eric Franzon: Yes, I did use DreamWeaver, a text editor, and some heavy use of Copy/Paste.

Q: reusing terms/names is all very well but it’s important to understand the MEANING e.g. is one ‘customer’term the same as another?

A: Absolutely! Choosing terms to use on the Semantic Web is equivalent to choosing terms in any other information processing system. You do need to be careful to say what you mean. Fortunately, RDF terms are resolvable on the Web itself (by following terms’ URIs). Each RDF term should provide the ability for a user to read exactly what the author of the term meant it to mean. That situation is better than the short and ambiguous meanings of terms generally associated with IT systems (such as relational database schemas or spreadsheet column names).

Q: Are there query clients for the semantic web?

A: Sure, although at this point most programmers are making their own. There is no de facto standard tool in use by a dominant number of people. You might have a look at these for enterprise use:
http://www.talis.com/platform/
http://topquadrant.com/products/TB_Suite.html
http://www.openlinksw.com/

If you just want to try a few queries for yourself, try these:
http://demo.openlinksw.com/sparql
http://www.sparql.org/query.html
http://hyperdata.org/sparql/demo/
http://data.semanticweb.org/snorql/
http://dbpedia.org/sparql

To get some data to play with, try here: http://esw.w3.org/SparqlEndpoints

SemTechBiz is Less Than 2 Weeks Away

The Semantic Tech & Business Conference (SemTechBiz) is coming to San Francisco on June 3-7! Join us for case studies, innovative panels, tutorials, and keynotes that will provide you with practical advice, hands-on guidance, and breakthrough approaches to solving business problems with semantic technology. Passes go up $200 at the door. Sign up now and save !