Archives: July 2009

Making the Web Smarter

Fred Wilson, principal, Union Square Ventures:

The dream of the semantic web has been upon us for quite a while now. There have been hundreds of academic research projects, hundreds of approaches, and hundreds of startups working in this space. We have several in our portfolio like Adaptive Blue, Zemanta, Outside.in, Infongen, and one more we have not yet announced.

But this is a hard problem to solve and I don’t see a single clear path to getting it solved. And what’s interesting to note is that the most ambitious approaches have largely been failures. If anything, the more pedestrian approaches are showing more promise.

Complete blog post

Semantic Game, Set, Match

Jennifer Zaino
SemanticWeb.com Contributor

Semantic technology vendor TextWise is working at solving some problems for the enterprise, from reducing the cost of customer service to more efficiently monetizing patents. Basically, its technology is used to match something to something else, whether it’s an ad to a web page, a news article to another news article, a blog to an internal document, and so on.

Right now, one of its focal points is helping businesses better enable customer self-service. What’s the problem there? Typically the knowledge bases that normally would be able to answer customers’ questions online are silo’d in different content management systems, explains TextWise CEO Connie Kenneally. That makes it hard for customers to be sure that self-service systems are getting them all the information they need. Companies can save a lot of money by letting customers answer their own questions, though, considering that it can cost a call center $20 or more a minute to answer questions using human employees.

“By tagging all that content with our semantic signatures, you can relate information that’s similar to what people are asking questions about,” she says. “It puts all the content management systems on the same lingua franca, so these walled silos can be opened up and people can get into them without actually going into them.”

The vendor, whose roots are in contextual advertising, defines its semantic signatures as providing a granular way to match content by accurately describing the ‘aboutness’ of the text. The result aims at making it easier for enterprise users-or their customers-to find things online or on a corporate network by identifying what concepts are unique about a document and then finding other documents that that data is related to. To match one thing to others, “we represent text in a way that provides deeper meaning of it than either keywords or entities, so the documents or form entries can be used as queries or exemplars for finding other similar information,” Kenneally says.

Take, for example, an article about Microsoft’s return to digital rights management – the text itself may never use the words “intellectual property” or “copyright,” but TextWise’s technology understands that that is, in fact, what the piece is about. So, it would return a semantic signature that lists and weights all the concepts reflected in the text. “The signature lets you see what concepts in this article are important and which are not,” she notes, and then use that to make more connections.

TextWise thinks there are ways to take its technology further to benefit business. For example, it’s addressing intellectual property issues by having put semantic signatures on the full U.S. patent database, “so larger companies looking to monetize their patent portfolio can match [that] to [patents] that are similar even though may not appear to be so in the patent classes provided by the U.S. PTO [Patent and Trademark Office].”

Letting developers play with API

That project is currently in a seed state, with plans to go into beta testing with some larger businesses that have very large patent holdings. “That just shows the depth of the technology,” Kenneally says. “A lot of people can go into general content and tag that with something and be able to match it, but when you are talking about something as deep as patents, that’s hard. But signatures do a really good job with it.”

She also sees opportunity in the medical infomatics and pharmaceuticals research areas, given the depth of data in these spaces and the difficulties traditional search tools have dealing with queries.

At the same time, TextWise doesn’t think it knows all the problems businesses are trying to solve, which is why it’s put the SemanticHacker API out on the web for developers to play around with themselves. The API is on the TextWise Semantic Cloud to let developers engage with its semantic signatures.

“We felt that by putting the API out we would get a lot of interesting ideas and feedback as to where this best plays,” Kenneally says. “We don’t have a monopoly on what the winning ideas will be.”

Recently, for example, she noticed that someone put up a video matching application on the web using TextWise’s technology-not something she would have thought of doing at all. “You just don’t know where these things will lead,” she says.

Putting the API out there helps build the ecosystem for the semantic web, she believes. Any developer can use it at up to 20,000 queries a day at no charge to bring their own ideas to fruition. Once they get above that, with the application already having been designed, tested and running, they can enter into a licensing agreement and a matching program with TextWise to tag all their own content with TextWise’s semantic signatures.

“The reason for our cloud is that we host those signatures so we can then match them to your content,” Kenneally says. “Our technology is computationally intensive, and rather than requiring a company to purchase and configure a large number of servers, we put it in the cloud and price on a per million query basis.”

Artificial intelligence software pioneer Cognitive Match raises … – PRLog.Org (press release) (press release)

Artificial intelligence software pioneer Cognitive Match raises
PRLog.Org (press release) (press release)
“This is a very exciting investment for Dawn Capital and an important Web 3.0 investment in TMT, one of our focus sectors,” said Haakon Overli,

Zemanta challenges link-preview startups with Zemanta Balloons – VentureBeat


VentureBeat
Zemanta challenges link-preview startups with Zemanta Balloons
VentureBeat
Second, they seem like a good fit for Zemanta’s core product — Zemanta uses semantic technology to recommend links, images, and other content as you write a

and more »

Semantic, Social Technologies Dutch Treat For Netherlands Newspaper

Jennifer Zaino
SemanticWeb.com Contributor

The newspaper publishing industry is facing some challenges on both sides of the Atlantic. Just ask Martijn Wuite, Internet manager at Dutch newspaper Het Parool:

“The state of the Dutch newspaper industry is tense at the moment with a major takeover taking place, the expected redundancies and national print advertising revenues under pressure,” he says. “It is important to increase readership by new methods to raise traffic for a publisher’s network to improve online advertising revenues and always market your content and brand.”

So the publication, with a readership of 89,000, has turned to f»dforward (pronounced ‘feedforward’), a widget-based recommendation engine from Kimengi that lets the online version of the paper connect with blogs and other sources and incorporates social elements.

The newspaper is taking the opposite approach of creating a walled garden of content and reader lock-in, with the goal of becoming part of the ongoing conversation online rather than risk being left out of it. To that end, there’s a need to help connect readers with other content that may be directly related to a particular article they’re already reading or otherwise tailored to pique their interest, Wuite notes in an email interview.

“When you want to be relevant, a site or blog should offer relevant content which can be on topic, but also can be based on topics of personal interest, not directly related to the initial topic in mind,” Wuite says.

f»dforward attempts to accomplish that, enabling lateral linking through a combination of semantic web technologies, social media connections, and the read/write web.

“In the beginning stage of the web, people could surf around and go to different topics based on context and interests and using writers of web sites who provided hyperlinks,” says Lucien Burm, CEO and co-founder of Kimengi. “But with the scale of the web today it’s been virtually impossible for a publisher to link to everything.”

Recommendations in real time

F»dforward provides a semantic technology element for connecting all kinds of topics, pages or parts of pages or whole web sites together where there is similarity or complementary content. It uses knowledge of readers’ social web connections (authorized by the user, of course) to understand their interests, what experts they follow online, and the like, as well as takes into account that a reader might also be a writer (blogs, twitters, etc.) in their own right, with an eye to what their writing reveals about their tastes. Then it uses this information to help content publishers deliver in real time recommendations matched to that individual, either from within that publisher’s own content or the network-at-large of content creators who are part of the f»dforward network.

“We are not indexing the whole web and trying to provide that structured data,” says Burm. “We have web sites like publishers who grow the network. They connect with other web sites and through these three technologies we try to make connections among them for people.”

Wuite says Het Parool has luckily seen a lot of growth in the last year, even amid the industry’s difficulties. Het Parool Digital in June 2008 switched to a modern publishing platform that gave it the ability to roll out the Parool.nl news site, Ondergrond.tv videoblog, a jobs section, events guide, and a mobile news site, among other things, with new features such as the Parool videosite, Parool.tv, coming up in August.

“The decision to start using f>>dforward has to do a lot with our open mentality towards new developments and the fact that most of us work here on the verge of content and tech for more than 10 years now, so we like to trial and error new functionalities and features on our website,” Wuite says. It’s been testing f»dforward “under the hood” for some time, officially launching it last week on the site. “The network that f>>dforward has obtained is the potential of titles that can be recommended and suggested automatically,” he says.

Given that it’s only been live for about a week, he isn’t surprised that users seem to need a little bit of time to get used to it, because they don’t see it yet as classic recommendations where they might be expecting topic relevance alone. “However, once they get a little bit more accustomed to the suggestions, you notice a higher acceptance of the broadened suggested topics, which in the end are based on the user’s online trail he or she leaves behind,” Wuite says.

Read more

New Browsing Software Reveals Hidden Linkages Among Data

Deborah Gage
SemanticWeb.com Contributor

Sig.ma was made public this week, offering what its creators say is a new way to show relationships among data that’s scattered all over the Web.

Released by the Digital Enterprise Research Institute (DERI) in Galway, Ireland, the software combines a search engine with a browser, a mashup generator and an Application Programming Interface for developers. Search results can also be turned into widgets and will remain live even after they’re embedded in e-mails or Tweets or blogs.

How well Sig.ma will work remains to be seen. Its results don’t always make sense, although they are attractively displayed. The software aggregates links and pictures about a search term, organizes them into facts and lists them in a column – the “sigma” – down the center of a page.

If you’re searching for a person, Sig.ma may return his or her e-mail address, the place where he works or his associates. Mouse over any of these facts and Sig.ma will tell you where they came from – it returns as many as 20 sources and lists them down the right side of the page next to the facts.

Several big Web sites – Google, Yahoo, LinkedIn and some others – have started tagging some of their content according to emerging metadata standards so it can be understood by machines, and therefore read more easily by humans.

LinkedIn, for instance, marks first and last names and other basic personal information on its public profiles. Google in May started rolling out “rich snippets” [http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html] – short summaries that accompany Google search results for people or reviews. They show up when Webmasters follow Google’s instructions for marking their content with microformat or RDFa tags, two of the better known standards.

But it’s still difficult to collect and see semantically tagged information when it’s scattered around the Web. The creators of Sig.ma believe they have solved that problem.

“When we first saw the B&W pictures…pop up automatically the first time we ran Sigma we were really excited,” wrote Giovanni Tummarello in a blog post announcing the software. “That DERI data had been there forever yet never meaningfully used or integrated — let alone automatically!

“…But here it was! That file was there, discovered automatically and contributing marvelously to the mashup providing information about papers, (including technical reports that would not be listed otherwise) an extra picture, the phone number, a confirmation of the personal homepage, research projects and more.”

Sig.ma’s results are skewed by how often information about a search term is semantically tagged – not often for some people. Sig.ma’s search for Stefan Decker, a professor and director at DERI, delivered 50 sources, almost all of them right. But the name Deborah Gage was more of a mystery. Sig.ma delivered 14 sources, all but one of them wrong.

Fortunately, however, Sig.ma can be taught – users can accept or reject sources based on whether the facts the software lists about their search terms are true. Sig.ma is hampered by today’s Web, Tummarello wrote, where until recently data was marked up semantically “on a best effort-hacker enthusiastic-leap of faith way.

“Now that Google and Yahoo are starting to recognize the value of page markup, it is realistic to expect improvements in data coverage and quality,” he said.

Sig.ma is open source, and its creators plan to release the software’s index by the end of the week. The reasoning engine will follow.

Behind The Microsoft-Yahoo Search Deal

Jennifer Zaino
SemanticWeb.com Contributor

Microsoft and Yahoo are together at last, having just signed a 10-year deal where Microsoft’s Bing will power Yahoo! search, and Yahoo! will become the exclusive worldwide relationship sales force for both companies’ premium search advertisers.

Yahoo! will continue to use its technology and data in other areas of its business, a press release on the deal also said. During a conference call this morning to discuss the deal, Yahoo! CEO Carol Bartz and Microsoft CEO Steve Ballmer responded to a question about the deal’s impact as it relates to internal search in Yahoo! properties such as del.i.cious.

“We’ve been talking about web search,” said Bartz. “When we talk about internal Yahoo! search that is some of the innovation we’re looking at doing. That’s one of the issues we really needed to work out.”

At what point, for example, does Yahoo take and link information from Bing and do something on top of it as opposed to what it does horizontally, she said. With such things in mind, a key component of the deal was that Yahoo! would have full flexibility on what it can do inside its sites, she said; earlier she had noted, in response to a question about how Yahoo can innovate if Microsoft is in charge of the technology, that a lot of innovation happens above the search results.

Ballmer was quick to confirm Bartz on the flexibility point. “We’re anxious to see Yahoo! take full advantage in various parts of its network through our search technology, so it was important to structure an agreement that gave Yahoo! full flexibility,” Ballmer said. “Exactly where that will play off, Yahoo’s team I’m sure will tell you over time.”

Better value for advertisers?

But innovation in search should extend beyond the technology, Bartz said. The deal is touted as delivering not only more innovation in search but a better value for advertisers, who will benefit from its scale and enjoy greater ease of use and efficiencies working with a single platform and sales team, according to the companies.

“I think we should talk more often about the innovation on the sales and marketing side of it,” Bartz said. “Online advertising is in its pre-infancy and how we work together with the large CMOs and marketers and large agencies to really bring digital advertising to …where it should be, considering the amount of time consumers are spending online, is also innovation that Yahoo! is ready to step up to.”

Both parties emphasized that one of the factors that made the deal work this time around was that they spent a lot of time defining the partnership.

“We were really trying to run a long-term business to invest for our success and our future,” said Bartz. “We felt this is a true partnership with technology and selling….. Both have real skin in the game and there’s real excitement around it. The most important thing for us is that Yahoo! needed to get focus and focus again on what our mission is-to be the center of people’s lives online, and that is about great content on audience properties, a great mobile experience, and powered, again, by this technology that Microsoft has stepped up to.”

Speaking of mobile experiences, Yahoo! has the option of using Microsoft’s technology for its mobile platforms, but Bartz also emphasized that that is an option and not an exclusive requirement as it is on the PC end. “If somewhere down the road we wanted to switch, we could,” she said.

“It’s not like we came here with a two-page term sheet,” Ballmer added on the point of how the companies got the deal to work out this time around. “There are well over a hundred pages written to describe what we’re doing. It’s important to say this is what we were spending our time on.”

It wasn’t high-level abstractions, he said; rather, they worked through the operating principles of what the co-operation looks like in the partnership, “and that made us confront a lot of issues.” For more information on the specifics of the deal, you can find the press release at their joint web site.

Making The Web Smarter – The Industry Standard

Making The Web Smarter
The Industry Standard
The dream of the semantic web has been upon us for quite a while now. There have been hundreds of academic research projects, hundreds of approaches,

Adding Meaning to Millions of Numbers – MIT Technology Review

Adding Meaning to Millions of Numbers
MIT Technology Review
True Engineering Technology, a startup based in Cambridge, MA, has now developed semantic technology that adds meaning to numerical data to help prevent

SPIN Diff: Rule-based Comparison of RDF Models

Composing the Semantic Web
One of the new features of the Maestro Edition of TopBraid Composer 3.1 is a simple yet very flexible diff tool that can be used to compare two versions of an RDF file or database. Diffing is a common requirement of collaborative modeling work, but conventional text-based diff tools often fail miserably on RDF-based data.

Complete blog post

NEXT PAGE >>