It’s been said that I’ve called Google Plus “one of the subtlest and most user-friendly ontology development systems we’ve ever seen.” I did, and you can listen for yourselves on the Semantic Link podcast.
Why did I do so? Well, G+ follows some of the basic principles of linked data: it uses persistent HTTP URIs for people, Sparks (concepts) and posts. It allows you to indicate a relationship between to entities and give that relationship a type. It collects, and types, attributes about entities from the expected experts – the entities themselves. This is a “Field Trial,” so basic is just about all we should expect. Given the reported adoption rates, I think it’s made a pretty good start.
Let’s take those points one at a time, with pictures.
Persistent URIs for People
Everyone with a G+ account gets assigned a random string of numbers as their unique identifier. If you wish to discover yours, hover over your name and look to the status bar. If you want an easier way to copy & paste it, then visit profiles.google.com/yourusername and you will be redirected to your numerical equivalent, on your “posts” tab. Why not use the name you’ve indicated on your account as your identifier? Simply put, names change. A traditional example is marriage – one or both parties to the marriage usually change their name. Rather than managing the complex process of deprecation and subsumption of changing labels (names), a random unique identifier is used. This identifier is persistent. It also helps protect users by hiding their presumed email address and account usernames from spammers and crackers. You can swap out “posts” for any of your G+ tabs to see publicly available content: Posts, About, Photos, Videos, Buzz.
[click images to open them at full size in new tabs/windows]
How is this ontological? Well, you now have a URI. You are the subject in the subject-predicate-object triple which is the foundational element of an ontology. You find, either by searching for or G+ recommending to you, people to whom you want to connect. They also have a URI. They are the object in the triple. You put them into a Circle. The Circle has a label – all of the people in this Circle are “Friends.” The Circle is the predicate of your triple, and has been identified as being the predicate “Friends.” How simple is that? Over 20 million people are now building ontologies of their social networks, and most don’t even know it. Somewhere in my psyche is my inner Brain cackling at how we’re going to take over the world.
Persistent URIs for Sparks
Sparks are the name Google uses to refer to concepts, also known as subjects, categories, tags, terms (you get the idea.) When you click on them, you get a very clean list of search results – which are not the same as what you’d get by searching at Google.com. Sparks are a very simple taxonomy right now, but do have persistent URIs, which you can find by hovering over a Spark and looking in the left of the status bar at the bottom of your screen.
This warrants more dissecting and attention. Will they eventually use all or some of the hierarchy of Google Directory? Will they become hierarchical? Will the algorithm improve as we click on links that interest us? Can we add our own information? Are we creating new entities for Google as we search for and add Sparks to our items of interest – it seems that way. It’s not an ontology yet, but it’s a start. Lots of people creating persistent URIs for entities they’ve dreamed up – I hear that evil cackle again!
One interesting thing to note, especially for those who aren’t clear on the fact that semantic data can be kept private. These nodes can be hidden. For example, I can’t see the semantic web URI when I’m logged out – it’s locked down. It asks me to log in.
Persistent URIs for Posts
Happily, the screenshot I have for showing you about persistent URI’s for posts is one I’ve shared publicly to congratulate a colleague and share his good news.
In this post I congratulate a long time member of this community on the publication of his new book. You can see its time & date, who I shared it with, if it has been shared from me to others, who else has commented on it, and more. The capturing of provenance is fabulous on G+. If I reshare something, it doesn’t tell others from whom I got it, it tells us from where it started. This is a critical component to determining trust and authority. I can also add information; for example, I could easily have added a “Buy it here” link to my comments on the book, and embedded my Amazon Associates code to get a few pennies from any sales made to folks in my extended networks. Knowing that I boosted sales is informative and may please Bob, unless of course it means his royalty checks are smaller, in which case he can tell me he followed the paper trail and please cut it out!
Why this Matters
Google, by nature of its founding, is in a prime position to address the challenges that many enterprise technologists have when thinking about semantic data – how do we handle unstructured data? We have metadata: in schema, in taxonomies, in ontologies even. We have loads of content. With no metadata. How do we get them together? We can’t afford to hire a small army of indexers to apply the metadata to the content. The system metadata is insufficient and poor. We have a pretty good search tool, and have put some effort into data dictionaries, entity extraction and rules-based classification. We have tools that do latent semantic indexing and latent semantic analysis. Make sense of unstructured information? Sure, Google can do that. Hopefully they will not reduce efforts in these areas too much to focus on other projects. Many of us can execute a search and return nothing useful; crowdsourcing tagging in G+ may re-vitalize components of the search algorithm.
They’ve learned. From Search and News they’ve learned how to refine their algorithms to return content of a certain quality. From Uncle Sam, Froogle, Scholar, Custom Search and more they’ve learned how to separate the wheat from the chaff within domains of knowledge. From Reader, Knol and Bookmarks they’ve learned how people tag and categorize things. From Orkut and Wave they’ve learned, well, what doesn’t really work.
There are many who believe that the key thing about G+ is the integration of services – social, email, calendar, documents, photos, videos and more. Absolutely! Wouldn’t we all like to have ONE place from which to manage our lives online and off? One that is the same no matter what machine we log in from, no matter from where? Isn’t that one point of the semantic web – having the data I want access to be available to me from anywhere, when I need to pull it in, in a way that makes sense to me at the time?
It’s not Quixotic. The quest is being realized. Nodes of information are being networked, with typed links. Algorithms are being tweaked, are learning and applying their knowledge to content by machine processing. Google is using its base foundation – algorithmic search.
It’s not Sadistic. “Come to me graduate assistants and interns, let us toil together to teach these machines how we think. We shall build networks of data that they may process the data and come to understand.”
It’s not even Shoeless: “If you build it, [they] will come.” It’s better then that – If we make it simple, THEY will build it. And that is the key to growing the availability of linked data – keep it simple for the average netizen. See a picture of your Mom as a suggested connection. Drag and drop the picture to a Circle they’ve given you called Family. That’s easy. Mom herself can do that. And you’ve just created a triple.
Speaking of Mom, I’ll take a moment to acknowledge the privacy implications. What do we want her to see? Or our bosses? My full thoughts on the privacy and security implications are long enough for their own write up and will keep for another time. Is it something we need to consider? Yes. Is it something we need to panic about? I don’t think so, if we remember that nothing online is truly private. The evolution we are experiencing is not much different from that experienced by public figures when radio and television revolutionized communications. It is simply being felt by ALL who can participate in these internet social networks. As for me, I’m going to treat my online life the way I treat my offline life, thinking to myself, if I do or say this, what would my parents think? Would they be proud of me? Amused? Skeptical? Ashamed? I’m going to try and front-load that scale. With the occasional poke at outdated traditions.
Google must maintain its focus on the user experience: designing simple user interaction models, listening to feedback, moving forward carefully by balancing what the users want, what the platform requires, and what excites its product team. If I had the chance to submit my +1 on the bucket list of features, I would vote for the ability to make Circles and Sparks hierarchical. I would like the ability to say ‘No’ – post this to my Family, except Aunt Jo who has no sense of humor. (Slowly add domains and ranges and disjoints.) I would like to be able to use the URIs! Tag things with the URI of a Spark as opposed to just +1ing them as I traverse the web.
Before you protest, I will acknowledge: This is not yet a fully-fledged member of the linked data world. It is semantic. I look forward to seeing what developments are to come, and hope that we in the semantic technology community come to engage in this development in positive ways. There are many things on the wish lists of many communities: social, SEO/SEM, news, publishing, education. Semantics will be the foundation for them all. Keep Circling, enjoy your favorite Sparks, and most importantly: Happy Modeling!
- Open Data Institute Chairman Nigel Shadbolt Knighted by Her Majesty
- In Defense of PRISM's Big Data Strategy
- Open Data Institute Launches New Certificates to Aid Discovery of Open Data
- The Potential of the News Storyline Ontology