Get More Out of User-Generated Content With Bueda Tag Transformation API
The game of tag in the web world isn’t as clear-cut as it is in the schoolyard. On the playground, when you tag someone, you know that person is “It.” Everyone else does too, and they take appropriate action based on that knowledge – that is, run. On the web, when you tag a photo, a video, or other rich or high-density content you create, you know what you’re talking about, but your meaning isn’t always as clear to others who also could take appropriate action if they had a better understanding of what you’ve posted.
Bueda is a hosted services startup that’s trying to help publishers of this user-generated content increase its value by improving their understanding of it. The basic idea is that an outfit – a YouTube or Flickr, for instance – could send Bueda the tags users attach to their content, and in return receive clean metadata and categories to add to that content to better match it to advertising opportunities, enhance additional content recommendations, and increase search accuracy. “It’s the usual things you can do with the semantic web but in a low friction and easy way,” says Bueda CEO and co-founder Vasco Pedro.
The company developed the matching engine API that does this to help it launch a business that would optimize online ad placement in rich media and user-generated content, “but the barrier to entry was large and in the process we realized the tools we were generating to do this ourselves were useful. So we decided to make our first product the API that lets people use our technology to get actionable information to target advertisers, do content recommendations better, and connect content to increase the accuracy of search, and we will build out from there,” Pedro says. Access to the API is free during the beta period, and this week Bueda is adding contextualization of tags (machine understanding that Turkey the Thanksgiving dinner and Turkey the country are different things), and tag expansion (suggestions of additional tags to add based on knowledge of existing tags) to its capabilities. Also as part of this week’s update it plans to have improved response resturn from the current average of .4 seconds to .2 seconds.
The main API’s secret sauce, as Pedro puts it, lies within the concept of providing access to a really large number of semantic resources in parallel. “It’s a coverage problem,” he says of tags in open domains. “You can only address this if you have enough semantic coverage to address a large portion of the body of knowledge in tags, which is where the federated ontology approach comes from. We can include lots of different resources in a scalable and easy-to-maintain way.” It cleans tags – NYC, New York City, NewYorkCity are all understood to be one and the same thing, normalizes them as appropriate to the corresponding semantic tag, and generates semantic categories for the content to which the tags are associated. “This gives publishers categories they can directly use with their advertisers,” he says. “We’re using a fixed set of categories now but we will migrate first to a more detailed set and then do transfer learning, where the customer can define the categories they expect and we can issue that directly.”
Twenty-five API keys have so far been issued, he says, since the soft launch early last month, both to companies and individual developers. “The companies we talk to, we have the use case scenario, but the most exciting thing is seeing what people are going to be doing with this,” he says. For example, there has been developer interest in using it to parse Twitter’s content – “the higher density of that content is harder for the computer to understand it, and that’s where the semantic web can bring value,” Pedro notes. Someone else is also investigating its possible use in matching up soul-mates on dating web sites (who’s included in their profile tags “Battlestar Galactica” vs. “BSG,” for instance), and one company is investigating using it to normalize its own metadata from many sources to eliminate redundancies and give them a clea list to use for their applications.
Pedro emphasizes that the Bueda API isn’t focused so much on text, which typically has a lot of context and where he says other companies are doing great work on semantic analysis. “We believe we can make more of a difference where’s there’ less context,” he says, and notes there are possibilities for its technology to complement semantic solutions such as Zemanta, which enriches blog posts with tags, links and more, by providing clean information for the images and videos they index. Ditto for the semantic advertising space, “where Peer39 and others are doing great things, but there are not a lot of people in semantic advertising focused on user-generated content. The main reason is the lack of coverage. When you deal with text you can get by with Freebase, maybe along with Wordnet, for the taxonomy, but when you deal with tags that is not sufficient,” Pedro says.
Anyone interested in getting the Bueda API without signing on to the waiting list, follow this link.

Semantic Tech & Business Conference returns to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. 
Eric Franzon
VP Community
Jennifer Zaino
Contributor
Angela Guess Contributor
semanticweb.com Twitter feed loading...