Good-Bye to 2012: Continuing Our Look Back At The Year In Semantic Tech

Yesterday we began our look back at the year in semantic technology here. Today we continue with more expert commentary on the year in review:

Ivan Herman, W3C Semantic Web Activity Lead:

I would mention two things (among many, of course).

  •  Schema.org had an important effect on semantic technologies. Of course, it is controversial (role of one major vocabulary and its relations to others, the community discussions on the syntax, etc.), but I would rather concentrate on the positive aspects. A few years ago the topic of discussion was whether having ‘structured data’, as it is referred to (I would simply say having RDF in some syntax or other), as part of a Web page makes sense or not. There were fairly passionate discussions about this and many were convinced that doing that would not make any sense, there is no use case for it, authors would not use it and could not deal with it, etc. Well, this discussion is over. Structured data in Web sites is here to stay, it is important, and has become part of the Web landscape. Schema.org’s contribution in this respect is very important; the discussions and disagreements I referred to are minor and transient compared to the success. And 2012 was the year when this issue was finally closed.
  •  On a very different aspect (and motivated by my own personal interest) I see exciting moves in the library and the digital publishing world. Many libraries recognize the power of linked data as adopted by libraries, of the value of standard cataloging techniques well adapted to linked data, of the role of metadata, in the form of linked data, adopted by journals and soon by electronic books… All these will have a profound influence bringing a huge amount of very valuable data onto the Web of Data, linking to sources of accumulated human knowledge. I have witnessed different aspects of this evolution coming to the fore in 2012, and I think this will become very important in the years to come.

Good-Bye to 2012: A Look Back At The Year In Semantic Tech, Part 1

As we close out 2012, we’ve asked some semantic tech experts to give us their take on the year that was. Was Big Data a boon for the semantic web, or is the opportunity to capitalize on the connection still pending? Is structured data on the web not just the future but the present? What sector is taking a strong lead in the semantic web space?

We begin with Part 1, with our experts listed in alphabetical order:

John Breslin, lecturer at NUI Galway, researcher and unit leader at DERI, creator of SIOC, and co-founder of Technology Voice and StreamGlider:
I think the schema.org initiative really gaining community support and a broader range of terms has been fantastic. It’s been great to see an easily understandable set of terms for describing the objects in web pages, but also leveraging the experience of work like GoodRelations rather than ignoring what has gone before. It’s also been encouraging to see the growth of Drupal 7 (which produces RDFa data) in the government sector: Estimates are that 24 percent of .gov CMS sites are now powered by Drupal.

Martin Böhringer, CEO & Co-Founder Hojoki:

For us it was very important to see Jena, our Semantic Web framework, becoming an Apache top-level project in April 2012. We see a lot of development pace in this project recently and see a chance to build an open source Semantic Web foundation which can handle cutting-edge requirements.

Still disappointing is the missing link between Semantic Web and the “cool” technologies and buzzwords. From what we see Semantic Web gives answers to some of the industry’s most challenging problems, but it still doesn’t seem to really find its place in relation to the cloud or big data (Hadoop).

Christine Connors, Chief Ontologist, Knowledgent:

One trend that I have seen is increased interest in the broader spectrum of semantic technologies in the enterprise. Graph stores, NoSQL, schema-less and more flexible systems, ontologies (& ontologists!) and integration with legacy systems. I believe the Big Data movement has had a positive impact on this field. We are hearing more and more about “Big Data Analytics” from our clients, partners and friends. The analytical power brought to bear by the semantic technology stack is sparking curiosity – what is it really? How can these models help me mitigate risk, more accurately predict outcomes, identify hidden intellectual assets, and streamline business processes? Real questions, tough questions: fun challenges!

Adding Rich Snippets and Semantic Markup to Your Site

Barbara Starr of SearchEngineLand reports, “Semantic markup is becoming more and more popular in conjunction with large scale SEO. Adding rich snippets to send rich signals to alert search engines as to the relevancy of your content − whatever vertical they may appear in − is not only a wise move, but an SEO best practice. Included below is an illustrative guide highlighting currently available Chrome extensions, which you can leverage to both test on-site markup as well as expose any information regarding your competitors. An example is illustrated [above], and what follows is a guide to getting the information.” Read more

Should Microdata Become a W3C Standard?

Manu Sporny recently voiced his personal objection to the W3C microdata candidate recommendation. He writes, “The HTML Working Group at the W3C is currently trying to decide if they should transition the Microdata specification to the next stage in the standardization process. There has been a call for consensus to transition the spec to the Candidate Recommendation stage. From a standards perspective, this is a huge mistake and sends the wrong signal to Web developers everywhere. The problem is that we already have a set of specifications that are official W3C recommendations that do what Microdata does and more. RDFa 1.1 became an official W3C Recommendation last summer.”

GoodRelations Fully Integrated with Schema.org

Schema.org and GoodRelations logosSchema.org has announced that GoodRelations is now fully integrated into the markup vocabulary backed by Google, Yahoo!, Bing/Microsoft, and Yandex (read our past schema.org coverage). GoodRelations is the e-commerce vocabulary that has been developed and maintained by Martin Hepp since 2002 (previous coverage).

In the official announcement, R.V. Guha (Google) says, “Effective immediately, the GoodRelations vocabulary (http://purl.org/goodrelations/) is directly available from within the schema.org site for use with both HTML5 Microdata and RDFa. Webmasters of e-commerce sites can use all GoodRelations types and properties directly from the schema.org namespace to expose more granular information for search engines and other clients, including delivery charges, quantity discounts, and product features.”

Google Introduces Structured Data Dashboard

Google has announced the addition of a “Structured Data Dashboard” as a new feature in its Webmaster Tools offerings. The Dashboard gives webmasters greater visibility into the structured data that Google knows about for a given website. This will no doubt come as good news to people wanting confirmation that Google was consuming the structured data being published.

Google’s Rich Snippet Testing Tool has been around for a while and allows webmasters to see how their semantic markup might appear in a Rich Snippet. There are tools that allow developers to test semantic markup during the development process. However, until now there has not been a good way for a webmaster to see how (or even if) Google was consuming the structured markup in a given site.

Schema.org adds “Additional Type” Property

schema-dot-org logoDan Brickley announced today that schema.org has added the property “additionalType” to the basic building block, schema.org/Thing. As Brickley says, “The additionalType property makes it possible for Microdata-based publishers to list several relevant types, even when the types are from diverse, independent schemas. This is important for schema.org as it allows our markup to be mixed with other systems, without making it too hard for consuming applications to interpret. A description can use a schema.org type as a base, but mention others (e.g. from DBpedia, Freebase, eventually Wikidata…) to improve the specificity and detail of the description.”

As RDFa already allows for use of multiple vocabularies (through the ‘typeOf’ attribute), it is recommended that RDFa publishers use that native syntax.

Common Crawl Corpus Update Makes Web Crawl Data More Efficient, Approachable For Users To Explore

Common Crawl now is providing its 2012 corpus of web crawl data not just as .ARC files, but also is releasing the metadata files (JSON-based metadata with all the links from every page crawled, metatags, headers and so on) as well as text output.

Semantic web projects that use its corpus include the work of Web Data Commons, which last month created a new analysis on vocabulary usage by pay-level domain in its microdata and RDFa dataset.

With the metadata files, users don’t have to extract the link graph from the raw crawl, which, says Common Crawl Chief Architect Ahad Rana, is “pretty significant for the community. They don’t have to expend all this CPU power to extract the links. And metadata files are a much smaller set of data than the raw corpus.” Similarly, the full text output that users now can run analysis over is significantly smaller than the .ARC file raw content.

Datasets Addition Promising Extension For Schema.Org

A call for comments is out for a proposal for a ‘Datasets‘ addition to schema.org, via the W3C’s Web Schemas task force group that is used by the schema.org project to collaborate with the wider community.

The proposal extending schema.org for describing datasets and data catalogs introduces three new types, with associated properties, as follows:

Writing at the Schema.org blog, Dan Brickley calls it a “small but useful vocabulary,” with particular relevance to open government and public sector data.

Come Together: Schema.Org and Web Intents Make For Win-Win Opportunities

What do you get when you partner up the Schema.org markup vocabulary and the Web Intents specification? A win-win both for content publishers and search engines, says Dr. Michael Hausenblas, Linked Data Research Centre, DERI, NUI Galway, Ireland.

Hausenblas this week wrote about the “awesomeness” of connecting the two, describing how a search for a camera marked up using the schema.org vocabulary also could serve up a wave of Web Intents actions (existing and new ones) to take on the object. That could range from reviewing it to buying it.

“With Schema.org we have a way to describe the things we publish on our Web pages, such as books or cameras. And with WebIntents we have a technology at hand that allows us to interact with these things in a flexible way,” he wrote. With Web Intents, a framework for client-side service discovery and inter-application communication, services register their intention to be able to handle an action on the user’s behalf.

Speaking with the Semantic Web Blog, Hausenblas explains how the win-win happens: “Content publishers have an added incentive to use semantic markup there, not just to be better-ranked but to make their content more interactive,” he says. “And it’s a huge thing for search engines, as users can directly interact from them.”

