Mark Albertson of the Examiner recently wrote, “It was an unusual sight to be sure. Standing on a convention center stage together were computer engineers from the four largest search providers in the world (Google, Yahoo, Microsoft Bing, and Yandex). Normally, this group couldn’t even agree on where to go for dinner, but this week in San Jose, California they were united by a common cause: the Semantic Web… At the Semantic Technology and Business Conference is San Jose this week, researchers from around the world gathered to discuss how far they have come and the mountain of work still ahead of them.” Read more
There is no doubt about it: Schema.org is a big success. It has motivated hundreds of thousands of Web site owners to add structured data markup to their HTML templates and brought the idea of exchanging structured data over the WWW from the labs and prototypes to real business.
Unfortunately, the support for information about the sales and rental of vehicles, namely cars, motorbikes, trucks, boats, and bikes has been insufficient for quite a while. Besides two simple classes for http://schema.org/Vehicle and http://schema.org/Car with no additional properties, there was nothing in the vocabulary that would help marking up granular vehicle information in new or used car listing sites or car rental offers.
Recently, Mirek Sopek, Karol Szczepański and I have released a fully-fledged extension proposal for schema.org that fixes this shortcoming and paves the ground for much better automotive Web sites in the light of marketing with structured data.
This proposal builds on the following vehicle-related extensions for GoodRelations, the e-commerce model of schema.org:
- Vehicle Sales Ontology (VSO), http://purl.org/vso/ns
- Volkswagen Vehicles Ontology (VVO), http://purl.org/vvo/ns
- Used Cars Ontology (UCO), http://purl.org/uco/ns
It adds the core classes, properties and enumerated values for describing cars, trucks, busses, bikes, and boats and their features. For describing commercial aspects of related offers, http://schema.org/Offer already provides the necessary level of detail. Thus, our proposal does not add new elements for commercial features.
Among the mainstream content management systems, you could make the case that Drupal was the first open source semantic CMS out there. At next week’s Semantic Technology and Business Conference, software engineer Stéphane Corlosquet of Acquia, which provides enterprise-level services around Drupal, and Bock & Co. principal Geoffrey Bock will discuss in this session Drupal’s role as a semantic CMS and how it can help organizations and institutions that are yearning to enrich their data with more semantics – for search engine optimization, yes, but also for more advanced use cases.
“It’s very easy to embed semantics in Drupal,” says Bock, who analyses and consults on digital strategies for content and collaboration. At its core it has the capability to manage semantic entities, and in the upcoming version 8 it takes things to a new level by including schema.org as a foundational data type. “It will become increasingly easier for developers to build and deliver semantically enriched environments,” he says, which can drive a better experience both for clients and stakeholders.
Corlosquet, who has taken a leadership role in building semantic web capabilities into Drupal’s core and maintains the RDF module in Drupal 7 and 8, explains that the closer embrace of schema.org in Drupal is of course a help when it comes to SEO and user engagement, for starters. Google uses content marked up using schema.org to power products like Rich Snippets and Google Now, too.
In Part 3 of this series, Jarek Wilkiewicz details activating the small Knowledge Graph (built on Cayley) with Schema.org Actions. He begins by explaining how Actions can be thought of as a combination of “Entities” (things) and “Affordances” (uses). As he defines it, “An affordance is a quality of an object, or an environment, which allows an individual to perform an action.”
For example, an action, might be using the “ok Google” voice command on a mobile device. The even more specific example that Wilkiewicz gives in the video (spoiler alert) is that of using the schema.org concept of potentialAction to trigger the playing of a specific artist’s music in a small music store’s mobile app.
To learn more, and to meet Jarek Wilkiewicz and his Google colleague, Shawn Simister, in person, register for the Semantic Technology & Business Conference where they will present “When 2 Billion Freebase Facts is Not Enough.”
Barak Michener, Software Engineer, Knowledge NYC has posted on the Google Open Source Blog about “Cayley, an open source graph database.”: “Four years ago this July, Google acquired Metaweb, bringing Freebase and linked open data to Google. It’s been astounding to watch the growth of the Knowledge Graph and how it has improved Google search to delight users every day. When I moved to New York last year, I saw just how far the concepts of Freebase and its data had spread through Google’s worldwide offices. I began to wonder how the concepts would advance if developers everywhere could work with similar tools. However, there wasn’t a graph available that was fast, free, and easy to get started working with. With the Freebase data already public and universally accessible, it was time to make it useful, and that meant writing some code as a side project.”
The post continues: “Cayley is a spiritual successor to graphd; it shares a similar query strategy for speed. While not an exact replica of its predecessor, it brings its own features to the table:RESTful API, multiple (modular) backend stores such as LevelDB and MongoDB, multiple (modular) query languages, easy to get started, simple to build on top of as a library, and of course open source. Cayley is written in Go, which was a natural choice. As a backend service that depends upon speed and concurrent access, Go seemed like a good fit.”
Straight out of Google I/O this week, came some interesting announcements related to Semantic Web technologies and Linked Data. Included in the mix was a cool instructional video series about how to “Build a Small Knowledge Graph.” Part 1 was presented by Jarek Wilkiewicz, Knowledge Developer Advocate at Google (and SemTechBiz speaker).
Wilkiewicz fits a lot into the seven-and-a-half minute piece, in which he presents a (sadly) hypothetical example of an online music store that he creates with his Google colleague Shawn Simister. During the example, he demonstrates the power and ease of leveraging multiple technologies, including the schema.org vocabulary (particularly the recently announced ‘Actions‘), the JSON-LD syntax for expressing the machine readable data, and the newly launched Cayley, an open source graph database (more on this in the next post in this series).
Standard Analytics, which was a participant at the recent TechStars event in New York City, has a big goal on its mind: To organize the world’s scientific information by building a complete scientific knowledge graph.
The company’s co-founders, Tiffany Bogich and Sebastien Ballesteros,came to the conclusion that someone had to take on the job as a result of their own experience as researchers. A problem they faced, says Bogich, was being able to access all the information behind published results, as well as search and discover across papers. “Our thesis is that if you can expose the moving parts – the data, code, media – and make science more discoverable, you can really advance and accelerate research,” she says.
In a post yesterday at the official schema.org blog, Vicki Tardif Holland (Google) and Jason Johnson (Microsoft) have announced that schema.org has created a way to more richly describe relationships between entities in structured markup. The addition of the “Role” schema allows for the description of more complex relationships than were previously possible. In the post, the authors cite the business need as one that is often found in the domains of entertainment and sports.
For example, in schema.org, it can be asserted that Bill Murray was an actor in the film Ghostbusters [Fig. 1].
That’s all well and good, but how can one extend this relationship to include more detail such as the name of the character Mr. Murray played in the film? More on that in a moment.
The Learning Resource Metadata Initiative (LRMI) has released a technical briefing about schema.org. The paper was co-authored by Phil Barker and Lorna M. Campbell of Cetis, the Centre for Educational Technology, Interoperability and Standards.
LRMI, which we have reported on here, “has developed a common metadata framework for describing or ‘tagging’ learning resources on the web.”
The Cetis website says, “This briefing describes schema.org for a technical audience. It is aimed at people who may want to implement schema.org markup in websites or other tools they build but who wish to know more about the technical approach behind schema.org and how to implement it. We also hope that this briefing will be useful to those who are evaluating whether to implement schema.org to meet the requirements of their own organization.”
In making the announcement in a W3C list, Barker explained, “We often find that when explaining the technology approach of LRMI we are mostly talking about schema.org, so this briefing, which describes the schema.org specification for a technical audience should be of interest to anyone thinking about implementing or using LRMI in a website or other tool. It should also be of interest to people who plan to use schema.org for describing other types of resources.”
The technical brief can be downloaded from:
In the winter of 2012, The New York Times began its implementation of the schema.org compatible version of rNews, a standard for embedding machine-readable publishing metadata into HTML documents, to improve the quality and appearance of its search results, as well as generate more traffic through algorithmically generated links. The semantic markup for news articles brought to its web pages structured data properties to define author, the date a work was created, its editor, headline, and so on.
But according to a leaked New York Times internal innovation report that appears here, there’s more work to be done in the structured data realm as part of a grand plan to truly put digital first in the face of falling website and smartphone app readership and hotter competition from both old guard and new age newsrooms and social media properties that are transforming how journalism is delivered for an audience increasingly invested in mobile, social, and personalized technologies.
The report was put together with insights from parties including Evan Sandhaus, director for search, archives and semantics at The NY Times, who was instrumental in the rNews/schema.org effort as well as the TimesMachine relaunch, a digital archive of 46,592 issues of The New York Times whose use includes surrounding current news stories with context. While the report notes that the Gray Lady has not been standing still in the face of its challenges, citing newsroom advances to grow audience with efforts such as using data to inform decisions, it needs to do more – faster – to make it easy to get its content in front of digital readers.
NEXT PAGE >>