Russia’s leading search engine Yandex – which is collaborating with U.S. search engine giants in implementing schema.org and which last week partnered with Twitter to post tweets in real-time in search results – has made another deal this week: It’s working with Topsy Labs to enable social search.
Real-time search and analytics provider Topsy’s indexing and live-ranking will help Yandex search in Russia and Turkey identify and extract fresh and relevant results from social media sources. Vipul Prakash, Topsy’s co-founder and Chief Technology Officer, says Topsy’s corpus consists of about 100 billion tweets, and the page links and media referred to in them, all time-stamped and authorship-explicit. It does some amount of synthetic tagging to extract the topic from the tweet to make the topic searchable, as well as performs classification of content, where there’s more text to play with, for links referenced in tweets. It understands that the author is distinct from what is being discussed and who is referring to whom in postings, which feed into its graph of influence that ranks links in search results based on the influence of people talking about those links. That includes a global rank of a user independent of topic and terms and also keyword-level ranks based on what was in a tweet when they got attention for it.
Because it has such histories of people to extract from that a robust understanding of their network credibility, including how they’ve received attention from others in the past, Topsy does a really good job of getting rid of spam, Prakash says. That’s a particularly useful capability to bring to Yandex to weed out suspicious social tweets in advance of the controversial Russian presidential elections getting underway this weekend, as reports have noted that fake Twitter accounts have been created to drown out opposition voices by flooding Twitter’s hashtag service function. “In Russia there is a lot of precedent for political activism like that,” he says. “If something points out a problem with a candidate, they will have people start spamming it so you can’t actually find the real piece of information.”
Consumers can use the service free of cost. But for large partners, which now include Yandex, it deploys the search stack inside their data centers. For Yandex, says Prakash, a “motivation is having fresh results. About 10 percent of web searchers are looking for fresh content.” Bringing social networks into the search picture provides an incredible amount of signal for discovering fresh documents and ranking them, because new and interesting stuff starts spreading on Twitter, he says. “The cool thing is this is the first time social activity has results on the search page, globally in a large search engine that people can actually crowd-source what is important by sharing and propagating, and that shows up rapidly on search results for everyone else.”
Apart from the Yandex deal, Topsy also has on its roadmap designs on providing customers more analytics products. It has built a sentiment analysis engine for Twitter that works at the level of keywords and entities and is designed specifically to deal with the service’s shorthand posts. For example, in its blog it discussed how it measured the public’s positive and negative sentiment around the iPhone 4S shortly after its announcement. Companies can use such technology in social media more efficiently and with greater relevance that traditional opinion polls, for example, he says, and with the added advantage of Twitter users’ geo-location and timing information for further slicing and dicing findings.
In the coming quarters he expects Topsy to offer some solutions leveraging its sentiment technologies. Says Prakash, “There is valuable information for a lot of businesses trying to figure out social media.”