Russia’s leading search engine Yandex – which is collaborating with U.S. search engine giants in implementing schema.org and which last week partnered with Twitter to post tweets in real-time in search results – has made another deal this week: It’s working with Topsy Labs to enable social search.
Real-time search and analytics provider Topsy’s indexing and live-ranking will help Yandex search in Russia and Turkey identify and extract fresh and relevant results from social media sources. Vipul Prakash, Topsy’s co-founder and Chief Technology Officer, says Topsy’s corpus consists of about 100 billion tweets, and the page links and media referred to in them, all time-stamped and authorship-explicit. It does some amount of synthetic tagging to extract the topic from the tweet to make the topic searchable, as well as performs classification of content, where there’s more text to play with, for links referenced in tweets. It understands that the author is distinct from what is being discussed and who is referring to whom in postings, which feed into its graph of influence that ranks links in search results based on the influence of people talking about those links. That includes a global rank of a user independent of topic and terms and also keyword-level ranks based on what was in a tweet when they got attention for it.
Because it has such histories of people to extract from that a robust understanding of their network credibility, including how they’ve received attention from others in the past, Topsy does a really good job of getting rid of spam, Prakash says. That’s a particularly useful capability to bring to Yandex to weed out suspicious social tweets in advance of the controversial Russian presidential elections getting underway this weekend, as reports have noted that fake Twitter accounts have been created to drown out opposition voices by flooding Twitter’s hashtag service function. “In Russia there is a lot of precedent for political activism like that,” he says. “If something points out a problem with a candidate, they will have people start spamming it so you can’t actually find the real piece of information.”
Read more