Archives: October 2007

The Semantic Standards Gap

Jennifer Zaino
SemanticWeb.com Contributor

A number of efforts underway at the Object Management Group, a non-profit standards-setting industry consortium, have important implications for the development of semantic technologies.

The OMG looks at the semantic technology space from a standards perspective, with the idea in mind that, to make this all work, organizations will have to be able to take existing artifacts — whether they are in databases, in text documents, in software, and so on — and at least partway automate the process of using them for developing semantic ontologies.

“If you can’t automate it, you can’t get the traction needed to be useful,” says Elisa F. Kendall, co-chair of the OMG’s ontology group and CEO of Sandpiper Software, Inc. Sandpiper provides context modeling and transformation products that work with enterprise application and data integration tools and services to facilitate collaboration and information sharing among multiple databases, applications, and users in multi-vendor environments.

“The OMG is all about interoperability, and secondly about automation through modeling, and how you can leverage existing artifacts and re-factor them and model them better to get new capabilities out of them, or even to develop new stuff that is structured well and can be reusable because you modeled it that way.”

Kendall, who is also an active member of two working groups at the W3C, Semantic Web Deployment and the new OWL working group, became involved in the OMG ontology group because of Sandpiper’s work in developing a graphical way for people to design ontologies. UML, one of the primary standards from the OMG, is a graphical model notation that has a huge following among software developers, making it a good candidate to leverage for the development of semantic technologies, she says. Thus, UML became a lynchpin of ODM, the OMG’s Ontology Definition Metamodel, its standard for model-driven ontology development. UML is actually a family of metamodels that have the ability to use UML in conjunction with RDF and OWL, as well as topic maps and Common Logic, a first-order logic language with expressive capabilities that OWL by itself lacks.

“Ultimately, all software and service-development processes will have to take a model-driven approach for scalability purposes,” Kendall says, as technologies such as RFID build up the data in corporate vaults. “There is no way to hard code all that. You have to be able to generate software, and common semantics should drive it.”

As Kendall explains it, the Ontology Definition Metamodel helps create a bridge over a gap that the W3C standards don’t address. “They don’t address traditional software engineering. They only address the stuff that is relevant for the web. That’s a big gap,” she says. ODM “lets you use the semantics with UML so that you can integrate them into your general software modeling and software engineering framework.”

Another gap is around ER diagramming, for building databases, which is where OMG’s new IMM (Information Modeling and Management) standard comes in. Using technologies that support this standard, organizations may find a way out of their database nightmares. So many large companies are running thousands of databases that they can’t turn off for operational reasons, but neither do they really understand what’s in them or how they work, and the people who built them are long gone.

“To understand what you have you must extract the semantics,” says Kendall. “Then you have to manipulate those semantics using semantic technologies, to clean them up, make them consistent, understand them, and then use them as a basis for going back to the database world and mapping across your databases,” she says. As a result of cracking this nut, companies potentially could turn off many of these mystery databases and save the millions of dollars they spend annually to maintain them. “People outside of the research community didn’t get that bridge between the database development world and the semantic world until relatively recently,” Kendall says.

Another gap to bridge: the business rules community. Business rules engines, Kendall notes, traditionally have been designed by business analysts trying to automate processes or individuals working on specific applications, like fraud detection in banking. Rarely do they use the same terminology or vocabulary, and often neither database and or IT personnel have a hand in their development. “The bridge between business rules development and other places in the business should be based on the same vocabulary, but because they are silo’d and maintained by different organizations, it’s not,” she says. “Through ODM we can make connections to to emerging business rules standards coming out of the W3C and OMG.” These include PRR (Production Rule Representation) and SBVR (Semantics for Business Vocabulary and Rules).

Buttoning down standards and specifications is just the half of it, though.

“The really important work is still to come,” says Kendall. “It’s the products that come out of these mappings that let you take advantage of semantics in other kinds of software engineering activities, in database alignment, in eliminating redundancy, or other kinds of things like that that will make this real to more people.”

Sandpiper, for instance, currently has a tool for building ontologies in UML, but it’s also working now on a second-generation framework that will incorporate more automation, such as automated integration with reasoning engines for validity and consistency checking, and also integration with back end knowledge bases.

Announcing Semantic Tech & Business Conference - San Francisco 2012

Semantic Tech & Business Conference is returning to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. Sign up now!

A Snapshot of Semantic Web Trends

Jennifer Zaino
SemanticWeb.com Contributor

This month Jorge Cardoso of the Department of Mathematics and Engineering at the University of Madeira in Funchal, Portugal, published a paper entitled “The Semantic Web Vision: Where Are We?”

Cardoso is well-qualified to ask the question. A Ph.D. in computer science, he’s published more than 60 papers in the areas of workflow management systems, semantic web, and related fields, and he was the co-organizer and co-chair of the three International Workshops on Semantic and Dynamic Web Processes.

To answer the question, Cardoso conducted a survey of 627 participants between December and January, based on 14 questions related to particular aspects of the semantic web and its technologies. The survey covered the following categories: tools and languages for building ontologies and the ontology languages used; ontology, which asked which domain or industry was affected, what methodology was used; and why and how to align and integrate ontologies; ontology size; and production, which looked into timeframes for developing ontologies and putting systems to work.

One finding — perhaps not a surprising one considering where a lot of semantic web activity is — is that when asked to indicate for which industries they were representing knowledge with ontologies, education took the lead (31 percent), followed closely by computer software (28.5 percent,), with government (17 percent), business services (17 percent), and life sciences (16.5 percent) tracking next in line. Communications (13 percent), the media (12.8 percent) and healthcare providers (11.3 percent) were the only other industries to see double-digit percentages.

Another finding is that respondents are using ontologies mainly to share a common understanding of the structure of information among people or software agents, so models can be understood by humans and computers. In total, just about 12 percent use ontologies for code generation, data integration, data publication and exchange, document annotation, information retrieval, search, reasoning, annotating experiments, building common vocabularies, web service discovery or mediation, and enabling interoperability.

“The language with the strongest impact in the semantic web is without a doubt OWL (which is derived from DAML+OIL and builds upon the resource description framework),” writes the author, noting that more than 75 percent of ontologists have selected this language to develop their ontologies. Yet, he also notes that description logic and FLogic have a penetration rate of 17 percent and 11.8 percent, respectively, and that RDF(S) and DAML+OIL have a penetration rate higher than 64 percent and 12 percent, respectively. Cardoso concludes that “the study shows that the semantic web does not even need OWL and can
achieve important objectives such as data-sharing and data-integration using just RDF alone.”

When asked which method they used to develop ontologies, “We were overcome by the percentage of respondents (60 percent) that develop ontologies without using any methodology,” the author writes. Another surprising finding for the author is that “the ontologies being developed are much smaller in size than can be ascertained from many research papers and conference keynotes and talks.”

According to the report, each respondent was asked to indicate the average size of the smallest, typical, and biggest ontologies they were working with. Nearly 75 percent said that their smallest ontologies had less than 100 concepts, and about 20 said they had between 100 and 1,000 concepts.

“When asked about typical ontologies, 44 percent of respondents stated that such types of ontology had between 100 and 1,000 concepts, and 35 percent considered that typical ontologies in their organization have less than 100 concepts. Finally, when asked about the biggest ontologies being deployed, the majority of respondents, 33.5 percent, considered that this type of ontology had between 100 and 1,000 concepts.”

When will ontology-based systems be put into production? While Cardoso acknowledges that mainstream adoption of the semantic web is still five to ten years away, 70 percent of the people surveyed working on the semantic web are “committed to deploying real-world systems that will go into production in less than 2 years,” he concludes.

While nearly 30 percent actually have no plans to use such systems in the future, “more than twenty five (25.4 percent) of the respondents indicated that their organization was currently active in the development and installation of ontology-based systems. Almost 21 percent stated that they will put their ontology-based systems into production within 6 months, while 13.7 percent will wait one more year. About 12 percent will deploy ontology-based systems within 18 or 24 months.

The Start of Twine

Yesterday I was able to spend a few minutes on the phone with Nova Spivack, CEO of Radar Networks. If you’re reading this newsletter, you already know that Radar Networks announced its closed beta for Twine, its semantically enabled collaboration and information system.

Read more

Smartening Up Your Links

Jennifer Zaino
SemanticWeb.com Contributor

AdaptiveBlue today announced SmartLinks, which it bills as a way to bring the power of context and semantics to blogs and web sites.

Turning an ordinary link into a semantic link lets bloggers and web site operators bring “the best of the web to the site, and it lets users get to the best information about this object quickly,” says Alex Iskold, founder and CEO of the smart browser and personalization vendor.

SmartLinks are automatically inserted for links to pages about specific categories –books, music, movies, stocks, recipes, restaurants, gadgets, people, wine, and so on. The semantic angle is knowing the kind of object a person is linking to, so AdaptiveBlue can infer other links that make sense and then integrate all the related information in one place.

“We are taking a link that is like an atom of the web and saying, ‘What does it take to bring semantics to the link?’” says Iskold.

AdaptiveBlue has put together a vertical search engine specific to each category. For books, for example, it picked some 30 sites, but in some categories, it draws from hundreds of sites to help readers connect to relevant information from around the web. Iskold says AdaptiveBlue chooses web sites to include based on traffic rankings from sources such as Alexa, while it came up with its blog lists based on manual investigations. That includes finding which blogs in particular categories have been well reviewed in other publications, and which blogs those blogs themselves link to. Adding additional sites as new ones gain popularity can be quickly accomplished, Iskold says.

The first time someone anywhere on the web clicks on a SmartLink — say, to a book — AdaptiveBlue processes the underlying page or calls the web operator’s API, depending on what kind of URL it is. Iskold says this builds the groundwork for users elsewhere to leverage AdaptiveBlue’s having collected the semantics of the object and put it in its database. Think of the next steps that can come from knowing that on this page in Amazon there is a book called The Road and on Barnes & Noble there is another page with the same book, he says.

“That becomes a powerful concept where you know you went to the book on Amazon and I went to Barnes & Noble — computers now don’t know that we looked at the same book, but what if we connected that? That’s where you will see the next steps coming from us,” Iskold says.

Smart Links integrate with AdaptiveBlue’s BlueOrganizer, letting users of that technology instantly save the books, music, or other information from Smart Links, and have them become available in the BlueOrganizer Sidebar where users organize and search items. “At a high level we have a browsing technology and a publishing technology, so we are injecting semantics through consumers and their browsing experience and through publishers,” says Iskold. “You make the browser smart and augment pages with semantics, and once you connect them, you will have an integrated web effect. That’s what we are shooting for.”

Iskold sees a big difference between the efforts of AdaptiveBlue and that of Freebase or Radar Networks, which just publicly demonstrated its Twine technology.

“What I have argued and will argue with them in the future is that they are like silos and we are doing much more of a distributed approach,” he says. “We’re trying to do what’s actually useful, and then the semantics is not the end but the means to the end . That’s the whole difference in the approach.”

Radar’s Twine Ties the Semantic Web Together

Jennifer Zaino
SemanticWeb.com Contributor

Last week, Radar Networks moved out of stealth mode with the public preview of its semantic web-based online service dubbed Twine. Radar is calling Twine a knowledge networking service, designed to help consumers, professionals and enterprises share, organize, and find information.

A consumer might, for example, use the site to keep track of or find new things about a special interest or hobby, while a professional or member of an enterprise team can use it to work on projects with a customer or other team members inside or outside the company, bringing together and organizing all the email and information related to those projects. Unlike groupware or knowledge management systems — monolithic platforms that are not easily used between organizations — Twine provides a way to share knowledge and collaborate across boundaries.

Twine uses semantic web technologies and natural language processing to learn about each user and his interests, in order to build a set of concepts that connect him to data related to those interests. Rich contextual information is added to turn it into semantic content, which can be shared with other groups or people, to make it easy to conduct very specific searches — say, for a video about venture capitalists that are interested in green investing. The service also can rank the results of searches by their relevance to the particular individual — giving higher value to content that was recommended by a friend, for example.

Radar founder and CEO Nova Spivack freely notes that there are plenty of services out there that enable individuals to collect information, from Wikis to bookmarks. “But it would be extremely difficult to put the level of intelligence and power that we have built. There is a qualitative difference over this and Del.i.cious or Lotus Notes and a Wiki.”

Twine can be complementary to other semantic services such as Metaweb’s Freebase database of public information and Powerset’s natural language search interface, integrating with the former’s data sources and using the latter to improve searches.

According to Spivack, Twine provides the end user piece in the equation. Twine is where “individuals and groups actually can start to use the semantic web,” he says. “The other tools plug in. We are the central place where information and digital life come together in Web 3.0 and the semantic web. But we want to work with other companies that are adding value that we think will make Twine better.”

There’s also potential for media and product companies to use Twine as an on-ramp to the semantic web, says Spivack, providing a way to get their product and content into semantic form to develop richer relationships with their customers.

“The platform we created is hard to make, and nothing else can really do this today. It will be five years or more before these kinds of capabilities are available anywhere else,” Spivack says, noting the company has a number of patents around semantic advertising, personalization, web crawling, mining, and search, among other things in this space. “In the meantime, this is going to be probably the most cost-effective and high visibility way to do it.”

Written in pure Java, Twine is a full platform, down to the storage layer, for efficient and scalable storage for RDF data and queries, with capabilities for managing, creating, and sharing ontologies, support for building knowledge bases and managing privacy and identity and user accounts, relationships, groups, communities and teams. It’s also built statistical analysis, graph theory and social search into the model, and above that semantic advertising, user profiling, personalization, as well as APIs to get data in and out of Twine.

Twine is now moving into its beta stage of testing, which will see some 10,000 selected users live on the platform. The next opportunity for more users to participate will be when Radar widens the scope in the spring.

SmartMenu: Getting the Browser to Understand

Alex Iskold
SemanticWeb.com Contributor

More than a year ago I wrote an article in the Web 2.0 magazine entitled Smart Browser, Where Art Thou? Here is what I said back then:

It is obvious that memory plays a critical role in human intellect and human interactions. Yet today our interactions with computers, and the web in particular, are disappointingly stateless. We keep going back to the Google search box and re-entering the same stuff over and over again. The computer simply has no idea what we are looking for and how to help us find it. Ah, you’d say, but how can it? Don’t we need artificial intelligence for that? My claim in this article is that no, we do not. Instead, we need to get inspiration from complexity science and focus on usability and productivity.

I went on to describe the first problem with today’s browser — the lack of understanding of everyday objects. As soon as we bookmark something the semantics is lost. The computer does not really know that the link represents a movie we liked, a book we just finished or a glass of wine we enjoyed. Because the computer does not know, it can not be helpful. The whole reason why people are helpful to each other is because we have common basic understanding of what things are.

I argued, though, that for the browser to be helpful, it does not need to have the same kind of understanding that we have. It all depends what problem are you trying to solve. Here at AdaptiveBlue, we are trying to solve the following problem (at least for now):

How can the browser and the user get to the relevant information faster?

This is seemingly a simple thing, but it really isn’t. The relevancy implies specificity to a person and the context. Faster means with less search and less copy/paste and less clicking. How can the browser help us do this?

Over the past year we made the first step towards solving this problem — it is called the SmartMenu.

The SmartMenu is a set of context-sensitive shortcuts personalized based on the user’s browsing history. Context-sensitive means that entries change depending on what the user is looking at. The personalization means that the sites in the shortcuts are specific to this user. And yes, shortcuts are just that — shortcuts.

Lets see why this makes sense. Say there are 30 really popular sites for shopping reviews and social networking around books. Most of us would only use a couple of these sites regularly. When we search around for books we either use a search engine like Google or go to the sites directly and search within the site.

Read more

Semantic Web as Competitive Advantage

Jennifer Zaino
SemanticWeb.com Contributor

When Tom Ilube left his post as CIO of Egg, the U.K.’s first online bank, it was with the intention of founding another large-scale company aimed at meeting an emerging consumer need and built on an emerging technology with practical potential. The result is U.K.-based, identity-protection firm Garlik, which weekly sweeps the web and presents to some 60,000 consumers a multi-sourced picture of potential identity fraud risks and how they might address them, and it’s underpinned by semantic web technologies.

Ilube, CEO of Garlik, is betting that this decade’s emerging technology will be as strategically important to the success of his company as Internet technologies were to Egg. Ilube first had to convince himself that the web in the next five years
will shift fundamentally from the document web to a data
web, where the meaning of that data is made explicit in some semantic form.
Next, he had to determine whether semantic web technologies could scale to
industrial-strength levels, given expectations that Garlik’s customer base
will grow and the fact that personal information on the web is, at the
least, doubling every year.

“What is important is, what combinations of information make you more or less exposed. It’s not just that I found your name there or date of birth or mother’s maiden name here, it’s that if all of those are available, even in different places, then suddenly I have enough information to take over your identity,” says Ilube. “We highlight that by looking at multiple sources online and give some meaning and context, to highlight what puts you at risk and whether that risk is low, medium, or high.”

Ilube knew that he would need to scale to billions of triples, the relationships among entities expressed in RDF format — and today, Garlik’s semantic store scales to 60 billion of them, with capacity beyond that. The company has implemented its technology across about 100 lightweight, low-cost Linux boxes strung together so that it can easily scale horizontally, one server at a time.

“It’s very different architecting a large corporate system where essentially the boundaries are known, even if it’s a large company, versus designing and building a genuine web-based consumer system where you don’t know the boundaries,” he says.

The third and final question Ilube had to consider was perhaps the most important. “For the problem area I was engaged in (the question was), would this set of technologies give me a genuine advantage over other ways of trying to deliver solutions?” Ilube says. “I concluded they would.”

Not today, necessarily, but over the next few years — as customer demands evolve and the environment becomes more complex — Ilube expects that semantic technologies will provide the best foundation for Garlik to quickly deliver new services.

From his experience as a CIO, Ilube knows just how critical it is to build a flexible infrastructure that can change to meet new business requirements. Back at the bank, he recounts, whenever anyone wanted to change or add a field to the customer database, it would immediately cause a panic.

Read more

Metadata Management High on Gartner’s List

Jennifer Zaino
SemanticWeb.com Contributor

Metadata management ranks No. 4 on Gartner’s newly released list of 10 strategic technologies for 2008. It falls just behind Green IT, unified communications, and business process modeling, as a technology “with the potential for significant impact on the enterprise in the next three years. Factors that denote significant impact include a high potential for disruption to IT or the business, the need for a major dollar investment, or the risk of being late to adopt,” according to the research firm.

Gartner says about metadata management that, through 2010, organizations implementing both customer data integration and product integration and product information management will link these master data management initiatives as part of an overall enterprise information management strategy.

It calls metadata management “a critical part of a company’s information infrastructure. It enables optimization, abstraction, and semantic reconciliation of metadata to support reuse, consistency, integrity, and shareability. Metadata management also extends into SOA projects with service registries and application development repositories. Metadata also plays a role in operations management with CMDB initiatives.”

Indeed, the ability to successfully manage metadata will be key to the application of semantic web technologies at the enterprise level, as organizations attempt to reconcile their disparate information sources. The Web Ontology Language (OWL) provides a way to formally describe the meaning of terminologies in web documents and the relationships between those terms. For instance, a data field in one database labelled “resume” is the same as another data field elsewhere labelled “curriculum vitae.”

“The semantic web breaks down a lot of barriers for merging data from different sources together and presenting it in a coherent and holistic way,” says Eric Miller, president of semantic web startup Zepheira, which is helping businesses address data integration challenges with semantic web standards and knowledge management technologies. “The applicability (of these technologies) inside of enterprises is potentially enormous.”






More From Jupitermedia

The Semantic Web and Your Intranet

The Business Case for Semantic Web

The Semantic Web: Are You Scared Yet?

BSM and the Semantic Web: Parallel Courses

If you want to comment on these or any other articles you see on Intranet Journal, we’d like to hear from you in our IT Management Forum. Thanks for reading.

- Tom Dunlap, Managing Editor.

Tools such as IBM’s Integrated Ontology Development Toolkit, for storage, manipulation, query, and inference of ontologies and corresponding instances; and TopQuadrant’s TopBraid Composer, which offers support for developing, managing, and testing configurations of knowledge models and their instance knowledge bases, are just two of the offerings out there (among a range of open source options, as well) that are addressing issues around metadata management.

Last year, webMethods also became a player in this space, with the acquisition of Cerebra’s semantic metadata management technology. (For a comprehensive listing of semantic tools for metadata management and other needs visit AI3 and a section called
Comprehensive Listing of 250 Semantic Web Tools.)

“Semantic technology provides a new kind of language to help bridge the (business and technology) gap in a fundamental way. It creates a shared vocabulary for business over the space that both business and technology people interact with, and that vocabulary becomes part of the model — the ontology — that is able to be understood by human beings and computers,” says Robert Coyne, COO at TopQuadrant. That’s important in today’s rapidly evolving business climate, he says. “The rapid business cycle requires creativity at the edge, and empowering end users to do things on their own.”

Semantic Web’s Linux Parallel

Jennifer Zaino
SemanticWeb.com Contributor

Is the semantic web following the Linux playbook? In some ways, maybe.

The reference here is mainly to the fact that many enterprises were hesitant to adopt Linux as a platform for mission-critical applications, when their main option for service and support was the volunteer open-source community. Not to fault the open-source community, which has a reputation for helpfulness and responsiveness, but businesses tend to feel more secure putting their trust in contracts for specified services.

It was the commercialization of Linux distributions — which offered product support, training, and other services as well as affiliated infrastructure and systems management products — that opened the door to more mainstream and mission-critical deployments of the operating system. That model has been followed with some success by commercial open source applications, such as SugarCRM.

“There are a lot of parallels and anti-parallels to the Linux story,” says Dean Allemang, chief scientist at TopQuadrant, a semantic web consulting company and creator of the semantic web development tool TopBraid Composer, and the semantic application deployment environment. TopBraid Ensemble, a semantic application for collaborative information management, runs on TopBraid Live. It mirrors the Linux story in the issues that surround building trust around a new-fangled idea. It’s the old chicken-and-egg scenario: one way to gain trust is by demonstrating mission-critical solutions, which no one will build until they feel they can trust the environment and building platforms.

TopQuadrant is among the early set of companies helping to build that trust. The company counts among its employees VP of product development Holger Knublauch, Ph.D, who created the leading open source ontology development tool Protégé OWL while he was a researcher at Stanford University. Built in the early 90s, the tool so far has some 50,000 downloads to its credit, TopQuadrant says. But it wasn’t designed with support for the new W3C standards, and the open source system wasn’t up to managing industrial strength projects, TopQuadrant says.

“People wanted a commercially supported version of the ontology editing environment they were accustomed to,” says Allemang, that supported the new standards. Hence, the signing-on of Knublauch and the birth of TopBraid Composer, which has been available for about the last year and a half. Because of its heritage, TopQuadrant says it’s an easy upgrade path for users from Protégé to TopBraid Composer.

The TopBraid suite aims to break one of the barriers to widespread adoption of semantic web technologies — the lack, COO Robert Coyne says, of “an enterprise-class development and deployment environment to deploy scalable semantic applications.” What this technology enables, he says, is “model-based applications and the model-based deployment of systems for business solutions, where the models are living things that have explicit meaning encoded in them. They can be queried and they can evolve.” In other words, you can change what the application does by changing the model, not rewriting a program.

According to TopBraid, there’s an explosion of interest in where the semantic web can take the enterprise, and questions about how to make it all work. Organizations are becoming very interested in getting base training on what the semantic web is all about, and the standards around it, from the company, it says.

“They see this as exciting, but when they decide to do a deployment, it’s a tough situation,” says Allemang. “They have to figure out all the pieces of it. There’s a lot of good GPL software out there, but how does it fit in, what do I have to build myself for my specific requirements and how? That’s quite a daunting decision and it keeps a lot of people from going to genuinely high-profile deployments.”

ClearForest: a Top-Down Approach to Semantic Web

Alex Iskold
SemanticWeb.com Contributor

We’ve been writing recently about the rise of semantic web and how in 2007 we’ll see many interesting semantic technologies. The fundamental problem that all these technologies need to solve is explaining the meaning of things to computers. There are several approaches to this, all of which in principle can work.

There are companies and technologies that are doing it bottom up — by embedding semantical annotations (meta-data) right into the data. The opposite camp is exploring the top-down approach, which relies on analyzing existing information. The ultimate top-down solution would be a fully blown natural language processor, which is able to understand text like people do.

In this post, we are going to look at ClearForest — one of the companies in the top-down camp. At first glance, you might not think much of the company’s web site, but a deeper dive reveals that ClearForest is restructuring — to apply its core natural language processing technology to facilitate next generation semantic applications. The fact that ClearForest has released both a Web Service and a Firefox extension that leverages an API to deliver the end-user application, says that the company gets what the next generation web is all about.

Gnosis — Firefox extension for annotating web pages with semantics

The first Clear Forest product that we looked at was the Firefox extension called Gnosis. Here is how it is described on the Mozilla extensions page:

“With a single click, Gnosis will identify the people, companies, organizations, geographies and products on the page you are viewing. Using the built-in navigation sidebar you can gain immediate understanding of the page’s contents.”

Downloading and installing Gnosis was as easy as any Firefox add-on. We used the Read/WriteWeb home page to try the extension. With one click from the menu, the page was filled with various types of annotations. The current version of Gnosis recognized Companies, Countries, Industry Terms, Organizations, People, Products and Technologies – an impressive range of things. Each word that Gnosis recognized, got colored according to the category.

This was interesting, but overwhelming. A better approach would be to have the coloring appear on a mouse over or another gesture. But this is a usability nuance that will get polished in the next iteration on the product. Overall, I was impressed. At an instance, the page was analyzed and annotated. It was not perfect (it thoughts that all the Jasons on the page were Jason Briggs), but it was more accurate than I expected it to be.

Next I turned my attention to the sidebar. The extension created a categorized tree of all words and phrases that it found on the page. We could expand and collapse each category to find the terms. It looked like vertical search for a single page. It was interesting and is very useful for blogs and lengthy pages.

Again, the interface needs to evolve – but the idea that key terms and concepts on any page can be identified and organized in such a way seems compelling. In addition to the organization, the extension offered to search for any keyword on Google, Wikipedia or Technorati. If you are interested in a keyword, you are likely to want to find more related information. So the context search seems like a logical extension of categorization, as it makes this data further searchable.

This article first appeared on Read/WriteWeb.

Read more

NEXT PAGE >>