Diving Deeper into the Deep Web

Jennifer Zaino
SemanticWeb.com Contributor

If you think “semantic web” is still one of those terms the public hasn’t yet become cognizant of, just imagine the trouble you’d be in trying to explain the “deep web.”

In general, the deep web serves specific vertical interests — such as business and medicine — because people understand the concept of business and medical search portals.

“we’re trying to make the deep web more available to the public so they can understand what is in it and why it’s important,” says Brian Despain, VP at Deep Web Technologies. “The deep web is larger than the surface web in terms of content and the content is almost exclusively spam free.”

Born out of work done while consulting at Los Alamos by president and CEO Abe Lederman, Deep Web’s offerings are based on a federated search technology platform that inexpensively lets subscribers, ranging from academics to analysts, and consumers using its free services search the deep web and subscription sources from a single search box. This all happens in real time at Biznar, where users can search business related sources; Mednar, a top medical search site; and Scitopia.org, where they can search research cited in scholarly work and patents.

Results for subscription sources can still deliver metadata or initial information to help the user determine whether it’s an appropriate fit, and Deep Web is planning to roll out a system that lets users input their own credentials when they are subscribers to those sources to have access from Deep Web’s services.

“Where we are headed is to let users self-actualize or set up their own acts right at Mednar and have access to their data services,” Despain says.

A corporate customer with subscriptions to a couple dozen sources can give Deep Web its credentials and get a version of its search engine tailored to support that. For individual users without subscriptions, it plans to roll out services to enable them to get monthly subscriptions or purchase the articles they find using Deep Web’s search portals, and Deep Web can leverage its group buying ability to save them a bit on a per article price or subscription basis. That should be available in the October-November timeframe.

Meantime, Deep Web continues to work on one important issue that has consumed bandwidth over the last couple of years — the fact that federated search doesn’t scale the way a traditional crawling and indexing search solution might. A new version will let the company significantly scale federated search beyond where it’s typically gone, Despain says.


“There are numerous technical problems. The sources themselves tend not to scale. They tend to have moderate or limited resources. So we need the ability to scale well beyond what a source can do,” he says. Mednar will probably be the first beneficiary of the new infrastructure, followed by Biznar in the next quarter. Also on the company’s radar is potentially introducing a deep web search portal for the legal industry.

With its various customer groups all aiming at knowledge discovery, Deep Web helps them in their hunt by clustering results. Its alerts can be set to deliver daily, weekly or monthly topic information that is “deduped” so users don’t see the same result twice. But Despain says they’re also looking for extended tools to help them discover more information. “They want a sort of guided signposts to the Deep Web,” he says. So it is looking at providing users tools for understanding where certain information might be located and then to visualize it.

“What that really is is that users don’t know what they don’t’ know,” he says. “They may be interested in some types of data and don’t know where to buy it or where to get it. Very often one of the interesting things is that there’s data that our customers use that they might be paying for now that is freely available on the deep web in another source.”

One way where Deep Web’s technologies may some day intersect with semantic web standards: Creating an RDF interface to all the deep web sources out there which those sources may not have the resources to do themselves.

“There is a lot of discussion in deep web publishing from the source side about how to leverage the semantic web and apply to existing databases, since a lot of their databases are very rigorous to begin with,” he says. “In many ways the semantic web and deep web are overlapping circles.”

Semantic Tech & Business Conference Returns to San Francisco

Semantic Tech & Business Conference returns to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. Sign up now!