Alexis Madrigal of The Atlantic recently shared an interview John Giannandrea regarding Google’s Knowledge Graph. Madrigal writes, “The ugly truth is that computers don’t know anything. They have no common sense. This idea had been circulating in Metaweb co-founder John Giannandrea’s head since 1997 when he was working at Netscape and thinking about how to reveal what you did not know you didn’t know on the web. If you were looking at search results for a hiking trail, say, what other hiking trails might you look at? Giannandrea called it ‘going sideways through the web,’ and he loved the idea, even if he couldn’t execute it back then.” Read more
Perhaps one of the most anticipated panels at next week’s Semantic Technology & Business Conference in San Francisco is the Wednesday morning session on Schema.org. Since the announcement of Schema.org just prior to last year’s SemTech Business Conference on the west coast, using the Schema.org shared vocabularies along with the microdata format to mark up web pages has been much debated, and created questions in the minds of webmasters and web search marketers along the lines of, “Which way should we go? Microdata or RDFa?”
Remember the days before Wikipedia had all the answers? We looked things up in libraries, referring to shelf-filling encyclopaedias. We bought CD-ROMs (remember them?) full of facts and pictures and video clips. We asked people. Sometimes, school home work actually required some work more strenuous than a cut and paste. We went about our business without remembering that New Coke briefly entered our lives on this day in 1985.
Wikipedia is far from perfect, and some of the concern around its role in a wider dumbing down of thought and argument may be justified. But, despite that, it’s a remarkable achievement and a wonderful resource. Those who argued that it would never work have clearly been proven wrong. Carefully maintained processes and the core principle of the neutral point of view mostly serve contributors well.
Just a little over a year ago The Semantic Web Blog introduced our readers to Gravity in this article. The project, spearheaded by former MySpace execs, is focused on building the Interest Graph. The team’s been pretty quiet about development efforts since that time — until just this month, when it announced Gravity Labs to let the public in on a little more about its underlying Interest Graph infrastructure and to showcase the platform. It also announced that it was open-sourcing some of the “plumbing” code it came up with during development, while understandably keeping its core IT, ontology and algorithms under wraps.
The announcement noted that the internally-named Gravity Interest Service for personalizing content at scale, in real-time, went live at production-scale 6 months ago. So far the technology has created over 400 million user interest graphs; served over 13 million pieces of personalized content per day; personalized the daily Internet experience of tens of millions of users per month; and processed over 25 million inbound interest signals per day, the company says. It expects that at this rate, that in under six months it will be handling 10X all of these numbers.
The Semantic Web Blog once again caught up with Gravity CTO Jim Benedetto to talk some more about the Interest Graph, a term he acknowledges gets thrown around quite a bit these days, with a lot of web sites claiming they’ve got the goods. But, he says, “what they effectively are saying is that buried deep within the data of our logs or deep in the data of how our users interact with our site, we know there are interest indicators there. But a lot of them are not doing much with their data.” Interest Graphs, he says, aren’t owned, but interest data resides in individual places and across the web at large — and they need the Gravity platform to help unlock that to create dynamic and personalized experiences for users, Benedetto says.
The W3C has announced a new workshop, Using Open Data: policy modeling, citizen empowerment, data journalism. According to the article, “For many years, W3C has been a keen promoter of Open Data, fostering a culture in which public administrations make their data available, ideally in machine-processable formats. Many governments have embraced the idea with enthusiasm, setting up national data portals. As part of the FP7-funded Crossover Project, W3C and the European Commission are running a Workshop to ask a simple question: what is all the ‘new’ government open data being used for?” Read more
ComScore this week issued a report that wasn’t particularly flattering to Google Plus. It noted that users spent just 3.3 minutes on the social network in January compared to 7.5 hours for Facebook. Much discussion revolved around the fact that Google last month touted that the service had grown to 90 million users from 40 million in October.
Google Plus, as The Semantic Web Blog reported here, informs the personalized results that are delivered through Search Plus Your World, such as the Google+ photos and posts users have shared or that have been shared with them through the social network.
One question raised by the ComScore report is what impact the slow takeup might have, if any, on Search Plus Your World. Shortly after Google Plus’ debut, The Semantic Web Blog published a post by Christine Connors, principal at TriviumRLG LLC, discussing why, as she has put it, the service is “one of the subtlest and most user-friendly ontology development systems we’ve ever seen.” Of the ComScore data , she says, “that’s an ‘average’ number. Which means that millions of folks who’ve signed up haven’t used it, and far fewer millions spend hours on it every month. What that says to me is that for some people Search Plus Your World would be almost useless, and for those who use G+ regularly SPYW has a decent and always improving personalized algorithm and index behind it. Take out the privacy concerns and the people using G+ will have an increasingly positive sense of satisfaction with Google for Search and more. Problem is, taking out the privacy concerns is very troublesome.”
I’m very happy to announce that the World Wide Web Consortium’s RDB2RDF Working Group, in which I participate as an Invited Expert, has published two Candidate Recommendations: R2RML: RDB to RDF Mapping Language and A Direct Mapping of Relational Data to RDF. This has been a long road and we still have some ways to go. The standardization process goes back to the W3C Workshop on RDF Access to Relational Databases, which took place in October 2007. The W3C RDB2RDF Incubator Group followed afterwards. After almost 5 years, we are on track to have a standard. However, what is this standard bringing to the table?
How do you define a forest? How about deforestation? It sounds like it would be fairly easy to get agreement on those terms. But beyond the basics – that a definition for the first would reflect that a forest is a place with lots of trees and the second would reflect that it’s a place where there used to be lots of trees – it’s not so simple.
And that has consequences for everything from academic and scientific research to government programs. As explained by Krzysztof Janowicz, perfectly valid definitions for these and other geographic terms exist by the hundreds, in legal texts and government documents and elsewhere, and most of them don’t agree with each other. So, how can one draw good conclusions or make important decisions when the data informing those is all over the map, so to speak.
“You cannot ask to show me a map of the forests in North America because the definition of forest differs between not just the U.S. and Canada but also between U.S. member states,” says Janowicz, Assistant Professor for geographic information science at UC Santa Barbara who’s one of the organizers of this week’s GeoVoCamp focusing on geo-ontology design patterns and bottom-up, data-driven semantics.
It cannot be denied that Stephen Wolfram knows data. As the person behind Mathematica and Wolfram|Alpha, he has been working with data — and the computation of that data — for a long time. As he said in his blog yesterday, “In building Wolfram|Alpha, we’ve absorbed an immense amount of data, across a huge number of domains. But—perhaps surprisingly—almost none of it has come in any direct way from the visible internet. Instead, it’s mostly from a complicated patchwork of data files and feeds and database dumps.”
The main topic of Wolfram’s post is a proposal about the form and placement of raw data on the internet. In the post, he proposes that .data be created as a new generic Top-Level Domain (gTLD) to hold data in a “parallel construct.”
We recently rounded up some thought leaders’ perspectives on the big semantic trends of 2011 – most (if not all) of them positive. Here’s some further perspective about where hopes and expectations fell a little short of reality:
- The biggest lost possibility was not rethinking the whole RDF stack. Instead of actually reducing complexity, it seems the direction is hiding complexity. This makes its proposition unattractive for web developers. – Andraž Tori, Founder and Director, Zemanta