From Information to Knowledge: Efficient Development of Semantic Applications


Executive Summary

To extract applicable information from an abundance of available data takes companies a great amount of time. The same applies to consumers who are searching for an answer to a question on the Internet. Semantic technologies promise to take on a large part of this task. iQser’s Middleware provides the basis for the efficient development of such intelligent applications. The bottom up approach makes this possible, by means of a fully automatic semantic analysis.

Introduction

Instead of directing people to the information, semantic technology brings the information to people,  specific to their situational requirements. Searching, which has been the usual approach to information systems, is no longer necessary. The situational requirement for a sales manager may be a meeting, which he or she has prepared for, or a customer for whom a proposal or quotation has to be prepared. A product manager may find it useful to complete a task on a project or a product which requires a marketing concept. The system prepares the relevant information accordingly. This correlates to a new organization of data: organization by connections, which are neither bound by data storage boundaries nor based on internal and external data, or different formats.

Content relations not only allow quick access to relevant information but also convey new results of economic importance: knowledge from previously solved problems and concluded projects can be applied to new tasks, offering synergy potentials of similar projects, suitable templates for new documents and concepts, or unexpected aspects or information in regards to a research and development topic.

However, it is not only valuable to evaluate direct connections, but the content network in its entirety. Thus, information can be detected, that is identified by several direct and indirect links. New client groups for example, can be identified in this manner. Possibly, these clients are interested in products, which are specified in relation to further products or specific suppliers. From there, it does not require much to discover other characteristics of the selected clients by means of the linked information.  

A further effect of content analysis is the presentation of knowledge from an information portfolio on a higher abstraction level. The analysis transmits the central concepts (topics, names and terms) and how these relate to each other. Hence, many benefits can be drawn simultaneously.  Users receive an overview of the subject areas and which aspects are connected to them. Likewise, central facts are conveyed. In pharmaceutical research, it could indicate the interdependency of certain proteins, symptoms of a disease or the side-effects of a drug. Furthermore, a concept map may be used to classify information, or to classify the targeted request for suitable information.

 A Bottom Up Approach saves Development and Maintenance Costs

The effectiveness of cross-company applications, in particular semantic types, is often paid for in terms of the effort required for development and maintenance. iQser’s Middleware avoids this problem by using a bottom up approach, extracting information from the data. A top down approach on the contrary requires the creation of a knowledge model first (e.g. in the form of an ontology), which then interprets and structures the available data. The training of algorithms based on a test database or annotation classification is similar to the top down model.

Instead, the iQser GIN platform starts with the semantic integration of information sources. The interface provides a so called content provider for every data format, transforming each piece of content into a semantically standardized generic object that contains an infinitely extensive list of descriptive attributes. By means of standardization, the system gathers that the content is  the description of a person, for example. Both structured and unstructured content can be included in this way with minimal effort. The overall content analysis takes place within the iQser GIN platform, without any extra effort.

Once the information sources are included, it is possible to quickly define complex applications, while considering fragmented data and cross-organizational processes. The iQser GIN platform provides the Uniform Information Layer for this purpose, enabling central access to all information and their connections. The developers of an application do not need to worry where the application draws the information from, in which format they exist or which data model is applied. All content is obtained in a generic object format according to a unified semantic standard and description. Of course, semantic searches are also featured. Content connections and abstract knowledge obtained from the information portfolio are requested via the very same interface. Therefore it is also possible to define new search routes, limited to a particular context.

Another central interface serves to intercept events, triggered by a a change in a data portfolio or if the analysis has lead to new results, for example if a new relationship between pieces of content has been discovered. This function can be used to launch or control processes or to send specific messages to people. Once more, the semantic information network can help when selecting groups of people. Developers are free to collaborate with process definitions from other providers.

Automatic Analysis with Integrated Self-Optimization

If a piece of content is detected by Middleware, it is indexed and analysed immediately.
The analysis of the semantic network of content (object map) is automatic, it does not require the recalculation of the entire data portfolio. This permits almost real-time reactions and a low computational power usage. Three processes are combined in the analysis: first, connections are detected on the basis of key attributes, such as peoples’ names; second, the similarity of two content object’s matter is determined; and third, content connections are extracted based of their use. The later causes a "learning effect" on part of the system, which recognises in practice which information is particularly important for the user in a certain aspect and which information is declining in importance. Further stages of analysis may be included by another interface.

The object map contains subject-predicate-object-connections. That means, each connection between pieces of content and their origin are described in detail. A person can be tagged with another one, for example in relation to a client or employee. Furthermore, the relationship between two content objects can be quantified. This measurement represents the relevance, or rather the similarity of a piece of content’s subject matter in the context of another piece of content. Such additional information can be used to select content and filter it,  protecting the user from a flood of information.

Sample Applications: Market Monitoring and Semantic CRM

In order to conduct market research in a company, a variety of sources are consulted and evaluated. As a rule, this information must be compiled from a number of different sources. This includes publications by the media, trade associations, analysts and market research institutes. Also, the Internet becomes an important means of a source: featuring competitors’ websites, consumer portals, blogs and forums. The effort required from an employee or agency to evaluate these manifold information is respectively enormous.

Thus, the process of semantic analysis initially filters all information, sorting it according to subjects which are relevant to a company. Then, the significant concepts are determined by subjects, brand names, companies and products according to connecting key terms. In this way, new market participants, technologies or trends (e.g. environmental protection), can be identified as they may increase in importance for consumers in relation to one product category. In the case of new or monitored subject constellations, selected people or groups of people can be informed with a message. Thus, the system can recognize for example, if prominent newspapers are all reporting on a subject, or if a key term is used in connection with a brand name.

Finally, it is possible to compile quantitative and qualitative analyses. This helps a company to determine how often its brand is generally mentioned in the press or on the Internet. A qualitative assessment on the other hand, would summarise a product’s ratings in the media, the blog space or on consumer portals.

In a project for a large German computer retailer, a GIN server, a CRM system, a document server and selected Internet sources were combined in order to determine the content relations. The user was able to view all relevant content from the connected sources according to his or her situational requirements. If he or she was dealing with a concrete client, they received an overview of documents referring to this client or web content containing a description. Even unexpected connections within the CRM system were discovered; such as competing companies that imposed potential sales channels.
Access to a source took place by means of data connectors, which (in the case of the Internet) allow for different formats (HTML, micro-formats, RDF, amongst others), interfaces of web services and standards such as OpenSocial for Web 2.0 services.

On the basis of semantic Middleware, even complex applications for content evaluation of fragmented data can be created quickly. Once the data sources have been connected, the different applications can use the Uniform Information Layer for central access. Analytic efforts are not necessary.

Wurzer - Figure 1

Wurzer – Figure 1


The picture shows an evaluation of a subject area from Wikipedia. On the left, is a concept map in the form of a word tree. The selection of a branch leads to a list of articles in Wikipedia, sorted according to their relevance. On the right-hand side is an object map in the form of a list of articles, their content relating to the selected article in the middle. The pop-up window indicates the evaluation of the selected connections as well as their explanation.
 

Announcing Semantic Tech & Business Conference - San Francisco 2012

Semantic Tech & Business Conference is returning to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. Sign up now!