Manu Sporny recently voiced his personal objection to the W3C microdata candidate recommendation. He writes, “The HTML Working Group at the W3C is currently trying to decide if they should transition the Microdata specification to the next stage in the standardization process. There has been a call for consensus to transition the spec to the Candidate Recommendation stage. From a standards perspective, this is a huge mistake and sends the wrong signal to Web developers everywhere. The problem is that we already have a set of specifications that are official W3C recommendations that do what Microdata does and more. RDFa 1.1 became an official W3C Recommendation last summer.”
Posts Tagged ‘microdata’
Schema.org has announced that GoodRelations is now fully integrated into the markup vocabulary backed by Google, Yahoo!, Bing/Microsoft, and Yandex (read our past schema.org coverage). GoodRelations is the e-commerce vocabulary that has been developed and maintained by Martin Hepp since 2002 (previous coverage).
In the official announcement, R.V. Guha (Google) says, “Effective immediately, the GoodRelations vocabulary (http://purl.org/
“The idea of the Big S Semantic Web seems to have fallen off by the wayside in publishing as people are just trying to structure their data,” says Barbara McGlamery, taxonomist at Martha Stewart Living Omnimedia.
McGlamery, who will be presenting a case study comparing her experiences in two publishing houses that took opposite approaches to the semantic web at the SemTech conference in NYC this month, says that the path most publishers are on now “hardly seems like the same beast” as the one she formerly knew. A few years back, the focus was on RDF, OWL, full-blown ontologies and inferencing engines, whereas today “it’s schema.org and we’re using microdata, not even RDFa.”
Google has released the structured data testing tool, a new and renamed version of its rich snippet testing tool. According to a blog by Yong Zhu, on behalf of the rich snippets testing tool team, improvements include:
- How rich snippets are displayed in the testing tool to better match how they appear in search results;
- A new visual design to make it clearer what structured data it can extract from the page, and how that may be shown in search results;
- And the availability of the tool in languages other than English (French, Spanish, Arabic, for example) to help webmasters from around the world build structured-data-enabled websites.
Google has announced the addition of a “Structured Data Dashboard” as a new feature in its Webmaster Tools offerings. The Dashboard gives webmasters greater visibility into the structured data that Google knows about for a given website. This will no doubt come as good news to people wanting confirmation that Google was consuming the structured data being published.
Google’s Rich Snippet Testing Tool has been around for a while and allows webmasters to see how their semantic markup might appear in a Rich Snippet. There are tools that allow developers to test semantic markup during the development process. However, until now there has not been a good way for a webmaster to see how (or even if) Google was consuming the structured markup in a given site.
Dan Brickley announced today that schema.org has added the property “additionalType” to the basic building block, schema.org/Thing. As Brickley says, “The additionalType property makes it possible for Microdata-based publishers to list several relevant types, even when the types are from diverse, independent schemas. This is important for schema.org as it allows our markup to be mixed with other systems, without making it too hard for consuming applications to interpret. A description can use a schema.org type as a base, but mention others (e.g. from DBpedia, Freebase, eventually Wikidata…) to improve the specificity and detail of the description.”
As RDFa already allows for use of multiple vocabularies (through the ‘typeOf’ attribute), it is recommended that RDFa publishers use that native syntax.
Common Crawl now is providing its 2012 corpus of web crawl data not just as .ARC files, but also is releasing the metadata files (JSON-based metadata with all the links from every page crawled, metatags, headers and so on) as well as text output.
With the metadata files, users don’t have to extract the link graph from the raw crawl, which, says Common Crawl Chief Architect Ahad Rana, is “pretty significant for the community. They don’t have to expend all this CPU power to extract the links. And metadata files are a much smaller set of data than the raw corpus.” Similarly, the full text output that users now can run analysis over is significantly smaller than the .ARC file raw content.
Manu Sporny recently shared his views regarding the difference between RDFa Lite and microdata. Sporny writes, “RDFa 1.1 became an official Web specification last month. Google started supporting RDFa in Google Rich Snippets some time ago and has recently announced that they will support RDFa Lite for schema.org as well. These announcements have led to a weekly increase in the number of times the following question is asked by Web developers on Twitter and Google+: ‘What should I implement on my website? Microdata or RDFa?’ This blog post attempts to answer the question once and for all. It dispels some of the myths around the Microdata vs. RDFa debate and outlines how the two languages evolved to solve the same problem in almost exactly the same way.” Read more
(Editor’s Note, June 29: The SparQLed project URL now is available here.)
SindiceTech today released SparQLed, the SindiceTech Assisted SPARQL Editor, as an open source project. SindiceTech, a spinoff company from the DERI Institute, commercializes large-scale, Big Data infrastructures for enterprises dealing with semantic data. It has roots in the semantic web index Sindice, which lets users collect, search, and query semantically marked-up web data (see our story here).
SparQLed also is one of the components of the commercial Sindice Suite for helping large enterprises build private linked data clouds. It is designed to give users all the help they need to write SPARQL queries to extract information from interconnected datasets.
“SPARQL is exciting but it’s difficult to develop and work with,” says Giovanni Tummarello, who led the efforts around the Sindice search and analysis engine and is founder and CEO of SindiceTech.