In part I of this two-part series, Dean Allemang & Scott Henninger draw on years of teaching TopQuadrant’s introduction course on the Semantic Web to make some observations on teaching Semantic Web concepts to a wide variety of students.

Whenever we learn something new, there is a tendency to relate it to things we already know.  This is how we bring our experience to bear on new things; the better we can apply what we already know, the more quickly we can come up to speed on new things.   But when it comes to learning a new technology, this tendency can also be a handicap.  A new technology is only interesting if it does something different from existing technologies.  Viewing everything as an old trick in new clothing conceals exactly the interesting aspects of a new technology.

This tendency poses a challenge when teaching any new technology (and Semantic Web technology in particular); the key to communicating the innovative value of the new technology lies exactly at the point where previous experience fails to be applicable.  The reason students sign up for a course in the Semantic Web is because they sense that there is real innovation here.  But that means that it will be necessarily unlike whatever was available in their technological experience.

Web technologies need to be accessible to non-technical people as well as technical people.  Among techies, the Semantic Web bears some similarity to a number of common technologies, including relational databases, XML, and Object-Oriented design.  Many disciplines outside of information technology also have key impact on the Semantic Web, including Linguistics, Philosophy and Library Science.  Students with experience in any, or as is often the case, many, of these related disciplines will approach the subject with different assumptions.

The “Aha!” Moments
While the Semantic Web standards are fundamentally very simple (e.g., RDF could be described as the mathematically simplest way to represent data), they are also highly innovative.  These innovations challenge different assumptions from each of the related fields from which students may draw their experiences.  Over and over again in our teaching, we find that students will have distinct "aha!" moments, when they realize how a Semantic Web approach causes them to think in a new way about some familiar aspect of a technology or approach they know well.  These "aha!" moments are the key to teaching and learning the Semantic Web; once a student "gets it," it is easy to learn the details of the technology.

But the nature of the "aha!" differs for students of varying backgrounds, and for each of the Semantic Web technologies RDF, SPARQL, RDFS and OWL.  We’ll share some of the more common and illuminating "aha!" moments for each technology.

RDF
The simple message of RDF is that you can represent linked data in a global way by structuring global identifiers (URIs) into triples.  This is really a quite simple idea, but is nevertheless a source of many "aha!" moments.

For people with a background in relational databases, this innovation lets them "think outside the table."  They often first see RDF as a way to make it as easy to add a column to a table as it is to add a row.  The "aha!" happens when they realize that they don’t have to think in columns and rows, and so they can manage non-tabular data just as easily.  Many database engineers have gone so far as to invent RDF in one form or another.  For these engineers, the RDF standard, and out-of-the-box RDF databases, allow them to buy this part of their solution rather than build it.

The "aha!" for XML practitioners comes with the shift from thinking about hierarchically structured data in documents to thinking of data as a distributed data structure.  The familiar XML tools (e.g., XPATH and XSLT) work well on documents, but require the programmer to implement any mechanism for links between documents.  In RDF, a document (e.g., in RDF/XML) is just an ephemeral representation of a linked data structure, not a resource in itself.

SPARQL
SPARQL is the query language of RDF and is based on a simple idea: You can query information from a data source by building a pattern to match against it.  In this case, the data source is an RDF graph, so the query is a graph pattern.  Understanding the syntax of specifying a graph pattern in SPARQL is a significant barrier for many but once the “aha” is achieved, students often become fascinated with their newfound ability to easily query data – it is often described as fun!

The idea of a query language is not new to most technologists; XML has XQuery and XSLT, which generalize tree patterns.  Relational databases have SQL.  Nevertheless, there is a common "aha!" experience, even among those very familiar with query languages.  One student explained his revelation simply:  "I get it now!  It’s all about the triples!"  Previous to this enlightenment, he tried to understand a SPARQL query in familiar database terms of foreign keys, primary keys, and expected something in the query to match them up (as is common practice in SQL).  He was effectively blinded by the simplicity of SPARQL.  For a dataset that included information like "John Kennedy has father Joe Kennedy,"  a SPARQL query to find out Joe’s children is simply "Who has father Joe Kennedy?"  This student’s "aha!" moment came when he realized he did not need to translate the question into a multi-table data structure; he can just ask a question that looks like the data.  We’ve seen similar, but less dramatic, "aha!" moments from object-oriented programmers when they realize that they don’t actually have to know how a query engine works in order to use it.

RDFS
RDFS introduces the notion of inferencing to the Semantic Web stack, in a gentle way.  But one of the most powerful innovations of RDFS is the fact that the schema language (RDFS) is represented in the same form as the data language (RDF).  At first blush, this appears to be a technical detail of the language design.  But it has profound ramifications in the use of RDFS.

This is a subtle point, and sometimes the "aha!" comes after the student has spent some time working with RDF and RDFS.  It usually grows out of frustration with the limitations of data warehousing approaches, where information can be linked together, but in a static way.  The shift in mindset happens when they consider a truly agile application, where e.g., the user interface is configured based on a query to the schema, while it is populated with data from the triple store.  The application even continues to work well when new data, and new schema, are merged into the system.

For Object-oriented programmers understanding RDFS classes is often a significant barrier.  Using Venn diagrams to show class subsumption relationships can invoke the “aha” needed to transition from classes as templates for instances to instances declaring or inferring their class membership.

OWL
OWL is the most elaborate of the Semantic Web languages and is the one that makes the most use of formal logic in its specification.  The notion of inference and proof is central to the understanding of OWL.  But it is still possible to use OWL without having an advanced sophistication in mathematical logic.

"Aha!" moments for OWL are fairly rare; for those already familiar with mathematical logic (e.g., from a background with programming in PROLOG), the concepts of OWL are well-known.  For those without such a background, OWL is not the easiest introduction to the subject of mathematical logic.  For those accustomed to RDBs, OWL looks like a way to define database views in a strongly interconnected way.  The power of a logical system in this context becomes apparent when one tries to define relationships between these views; a logical language like OWL makes this possible in a way that is impractical for ordinary database views.

For Object-oriented programmers, OWL poses the biggest conceptual challenge. Since OWL makes heavy use of an inference model for its meaning, as opposed to a procedural model, as is the norm for object-oriented programming languages, many of the familiar concepts in object-oriented methodology behave very differently.  In particular,  inheritance, a key notion in object-oriented design, does not appear as such in OWL.  A key insight that object-oriented designers gain while considering OWL is how their familiar inheritance behavior emerges, or fails to emerge, from basic logical principals.

Conclusions
Any interesting new technology provides innovations that challenge the mindset of the technologies that precede them.  The Semantic Web is particularly disruptive, partly due to the way that it embraces the unconventional aspects of the World Wide Web, and applies them to data.  The real challenge for teaching an innovative new technology lies in massaging the intuitions of those who are familiar with previous technology, so that they can simultaneously draw on their experience, while also reaping the benefits of innovation.  This is no easy task, for the instructor or the student.  We have scratched the surface of the main innovations of the Semantic Web stack, and how they result in educational barriers.

Education is a key factor in technology adoption; in fact, it plays a more influential role in adoption than technical soundness.  The W3C has done a fine job of dividing up the semantic web technologies so that it is possible to educate about each one independently (and correspondingly, it is possible to adopt them independently as well).  In the next installment, we will examine how the insights about semantic web education have an impact on the adoption of semantic web technologies.

References:
“Not Just Your Daddy’s Search or Browsing: Enabling users to interact with rich information spaces” Robert F. Coyne, Tim S. Smith

 “SEMANTIC WEB for the WORKING ONTOLOGIST” by Dean Allemang, Jim Hendler

TopQuadrant Semantic Web Technology Training Series