It’s gravity that keeps us tied to earth, and it’s Gravity that seeks to tie us to our personal world of interests on the web. With an executive team founded by MySpace’s Amit Kapur (CEO), Jim Benedetto (CTO), and Steve Pearman, Chief Product Officer, the premise behind Gravity is the building of an interest graph that will make it possible for third-party developers and sites to personalize content – including fully personalized newspapers – and services to you.

Well, you’ve heard the personalized content story before in various ways, shapes and forms, right? Here’s why Gravity expects to be different. “The interest graph is the next filtration device across the Internet, to filter content personalized to me,” says Benedetto. “When I look at the web today I see the web you see….With the amount of information being created that methodology breaks down.” Of course, we can thank the very social networking trend that MySpace, Facebook and Twitter were so instrumental in encouraging for this being the case, not only repeating but exacerbating the web’s pattern almost since its birth of information creation overwhelming the technology available to organize it. Now, Gravity wants to provide the web-scale way, built in consideration of but outside the social graph’s confines, to surface the massive amounts of information across that graph and from the web at large that which is implicitly and automatically personalized to users.

“The information overload today is overwhelming the social filters,” Benedetto say. “The technology is in place to create a truly semantic web. The problem to be solved in truly personalizing content according to our social graph starts with the fact that it’s a mistake to assume that our profiles and connections give the real scoop on what we really care about. We may want to appear more interested in some things than others in our public faces, we may join groups recommended by a friend because we like the person and not the topic, and we’re unlikely to update our interests in our profiles once they’re created – even if a significant life event changes those interests, but even more likely just because something we really do care about just wasn’t top of mind at the time. A point Benedetto makes is that there is content out there that you didn’t know you wanted to search for, but if you’re presented with the opportunity you suddenly remember something like, Gosh, I really did enjoy that trip to Costa Rica last year and I do want to know what’s going on there today.

“That’s the difference between explicitly asking what you are interested in or using implicit NLP and machine learning to look at people’s actions and behaviors and determine what their interests are,” he says, and from there give them an opportunity to prune or augment that.

Inferencing, Virality and Deep Personalization

Gravity couldn’t have acted on its web-scale proposition even a few years ago, pre-publicly available data sets and open source solutions like graph storage databases and cheaper access to expensive hardware resources to access its multi-layer ontology in a highly concurrent fashion via the cloud. To get to the result where it prioritizes the structured data it extracts about users’ interests from the text in their social graphs and authored pieces, using NLP and topic matches with its dynamic web ontology, is a process of inferencing – which it gets free with RDF but which required some changes to enable at massive scale, Benedetto says – and graph traversals in real-time in memory. Inferencing is “a relatively hard problem for machine learning at scale but it is required to build the interest graph,” he says.

“For example, if I say on Twitter that I paddled out today and it was fun, we know that paddled out is a colloquial term, and has a high frequency of use in a particular subset of people and is a strong indicator of likelihood of something,” he explains – that something being surfing. “Because of inferencing, a large ontology and graph traversals in real-time in memory we can say paddled out relates to surfing, is an indicator this person likes surfing, and because of that we also go up the graph and would say he’s interested in outdoor sports,” he says. So, if you say five or ten things that are a strong indicator that the way you talk means you are a surfer, horizontal traversals would also make a connection from outdoor sports to water sports – but also could connect you to snowboard sports on the premise that that might be a winter-time interest.

Here’s where the viral connection angle can come in. “You are probably interested in a lot of surfing articles but if there is an amazing snowboard article, we will show you those articles that are the nearest neighbors in the ontology that you may like because they are so viral,” he explains. It’s measuring that virality with some 50 million users on its Insights analytics platform. By combining a user’s interest gleaned from his web activity, intersected with Facebook OpenGraph Like interactions across 500 million users and access to Twitter’s firehose across some 200 million twitterers, “we can give you a [personalized] web that is a small subset of the highly popular web.” \

(If something’s not tweeted about or liked, Gravity takes the view that It’s probably not good enough to draw to users’ attention – while its open web crawlers are out there scouring the web, those crawls aren’t particularly deep out of design. “Deep crawling the web cis heaper than ten years ago but it still can be cost prohibitive” for a start-up, Benedetto says.)

There are ten to 15 alpha partners currently and quietly piloting Gravity’s technology on their content sites, looking for the increased engagement and time commitments that can correlate to their revenues. Other approaches to such engagement, even semantic ones, he says, still work from the premise of trying to match the next piece of content the reader will be interested in to the article they are reading about now. “Most of the popular ones are not even looking at other things I read over the last 20 minutes on the site, but at what page I’m looking at right now,” he says – the if-you-like-this-read-this-model or visit this topic page. “This works and is something that increases engagement. But it’s very different – we try to analyze users’ core interest at web scale and merge that with the content they like, regardless of where they are coming from or the article they are reading right now.” It’s conceivable that such a deeply personalized news site could have ten sections of different and most popular articles each related to a different interest of the user.

Content providers deploying this can go very deep, to the point of delivering the same URL news site that looks completely different to two different users based on their interest graph. That requires Gravity’s API and some deeper integration and handholding to make it work. A more lightweight implementation doesn’t require the page to change at all but displays indicator calls to action, such as a corner pop-up or top-down slider of the next best article Gravity thinks you will like. “It shows you the next best article based on your interest graph, the semantic analysis of the article and its virality across the web,” he says. Gravity still is working to determine things such as what level of ontological proximity such recommendations should have to content at hand.

“We’ve come to the shift where content generation is now in the hands of everyone,” Benedetto says. “That massive amount of content gets hard to leap through. Interest filtering and the interest graph and a truly semantic web makes it digestible again.”