Archives: September 2008

Sociocast Unveils Personalization Platform

Jennifer Zaino
SemanticWeb.com Contributor

Sociocast Networks introduced this week its personalization and discovery technology, which will blossom first in a fashion site to be privately tested in October. Based on a combination of artificial intelligence and social network dynamics, the content delivery platform aims to leverage “associative memories” and “influencer networks” to help users find the information they want without having to create filters to encapsulate their desires.

“Sociocast is next-generation personalization,” says Ari Goldberg, co-founder and chief channel officer, an expert system providing content ranging from products to articles to persons to advertisements based on the understanding it gains of individual preferences. If making those kinds of recommendations sounds like what Amazon does, it is … but, says Goldberg, to a more advanced degree.

“We aren’t claiming we are smarter,” he says. “We just ask the same questions with a new set of tools for more accurate recommendations based on algorithms we have put together.”

It’s moving forward with its proprietary technology for standardizing data and creating attributes around content that enable Sociocast to create relationships among objects and with events, people or other content. “Everyone is talking about the semantic web and web 3.0 and the intelligent web, that it will all work once someone retags the entire web,” he says. “That’s a gigantic leap of faith, especially with exponentially more data coming online. We can’t be reliant on this, so we do it ourselves.”

Sociocast, he says, experiences your life, beginning with a registration process that starts the system off learning about you, then follows that up by capturing each click on the site as an event.

“Our brains don’t store data, they store associations,” Goldberg says in reference to the associative memory aspect of the technology. “So if I am on a fashion site and click on a Bottega bag — as a person you think this is pretty and I need this. But to our technology that is a collection of 300 attributes, the bag has certain weaving, leather, handles, etc.” Your passive experience looking at that item has enabled the site to accumulate information about certain characteristics that appeal to you, which helps it to curate your experience there and helps you discover the information you are most likely to want.

Read more

Announcing Semantic Tech & Business Conference - San Francisco 2012

Semantic Tech & Business Conference is returning to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. Sign up now!

OpenCyc Hooks Into Linked Data Web

Jennifer Zaino
SemanticWeb.com Contributor

Cycorp Inc., the developer of the Cyc Knowledge Server, a multi-contextual knowledge base and inference engine, has released a new version of the open source OpenCyc. The latest version now makes available as a download the entire OpenCyc ontology as an OWL file, as well as making all the OpenCyc content available as URIs.

Users can use the OpenCyc terms to represent web content. The company is putting forward these terms and a set of the relationships among them to serve as a shared vocabulary and knowledge-set to create meaningful information exchange among applications. With a shared vocabulary, Web applications can automatically reason about, and integrate, the content of Web sites and Web services, the company says.

The commercial product from which the open source OpenCyc draws is, to put it simply, an encyclopedia of knowledge about the world put in a form that machines can reason from. That system, whose development was begun nearly 25 years ago, was designed to address the problem with the brittleness of expert systems, in that such systems only know what they need to know to do the job. For example, an expert system in the medical field can provide a diagnosis of a disease for a person, but it doesn’t understand what a person or disease actually is.

“It’s what we know about the world that we don’t have to write — that water is wet, or is a liquid, and what is a liquid,” says Larry Lefkowitz, executive director for customer solutions. “You need that information in a way that machines can reason, and you want machines to reason.”

So, if you know that liquids flow freely and that water flows freely, the machine should reason that water is a liquid. “It’s a collection of millions of facts about the world. But from that million it can conclude many other things,” he says.

The commercial version sports hundreds of thousands of concepts — for example, that a bird is an animal with feathers, and beaks, and flies — and tens of thousands of ways of relating these concepts. The system distinguishes between concepts and the words that refer to them — for instance, the concept of dog is that it is a pet, and a mammal, no matter what word you call it by, whereas the concept for the English word “dog” includes that the word is a noun, and that you can count nouns, and so on. The OpenCyc version has all the concepts, and Cycorp has released a portion of its knowledge about the relationships among concepts. There’s enough there for people to draw from if they want to write their own concepts or relationships, Lefkowitz says.

The company expects to make available in short order web services — for example, a free version of a document tagging service. Commercial web services are expected to follow, which users can take advantage of without having to license the overall knowledge base.

With some 200,000 concepts, people can get very precise in how they tag content and its relationship o other information, he says. And now there’s easy access to the concepts via URIs, which makes it easier to write applications — that could be especially useful for small bloggers and information aggregators and publishers who want to more seamlessly leverage the power of the semantic web.

“We’re trying to provide a skeleton or spine or backbone that people can choose to hook up to that is broader than a couple of the high level ontologies trying to serve the same purpose,” says Lefkowitz. “But they don’t have the depth or representational power.”

Cogito Helps Connect Ads to Content

Jennifer Zaino
SemanticWeb.com Contributor

Cogito Semantic Advertiser wants to help companies get better at connecting ads to web page content, to improve click-through rates and generate profits.

Expert System USA, the developers of Cogito Monitor, has released Cogito Semantic Advertiser, a tool aimed at big media houses and publications, which automatically processes the meaning of text to ensure ads are placed on relevant Web pages.

The Cogito Language Technology Platform of semantic intelligence, the engine behind the tool, ensures that ads are placed appropriately to increase click-through rates, by automatically analyzing Web pages to identify the most relevant topics and extracting the main themes included in the text. According to the company, the tool classifies content by assigning the category related to the text in real time, based on an optimized taxonomy and high precision. It collects all useful data in an output format structured to be easily uploaded into a database; and directly integrates with the ad server, enabling the selection and serving of the most relevant ads for the Web page. Expert says its technology processes the page and the ad as they are created and stores the semantic results in a database, so the real test of performance is from a database perspective; if the database is configured correctly, processing can take considerably less than two seconds.

Are advertisers starting to ponder the effectiveness of current ad-serving techniques, which tend to rely on keyword frequency to place ad copy without considering the meaning? As an example, Cogito notes that that approach could lead to having an ad for a Caribbean vacation package inadvertently appear near an article about a massive hurricane that recently hit that region. Users reading about that event are unlikely to be in the market to vacation in that area. But that same ad placed next to a story about a particular resort in the Caribbean, or tropical cocktails, or even expectations for a freezing winter, should do much better.

“We do know that as little as a 5 percent improvement in the connectedness between ads and content improves click rates by as much as 50 percent,” says Expert’s CEO Brooke Aker. That should increase the ad’s ability to generate profits.

Expert also talks about the tool’s ability to make socio-cultural correlations for ad placement — determining that a user interested in high-end wines, for example, is a person who is also likely to be interested in fine dining — and using that data to help advertisers reach these new customers.

“For an example of how an advertiser can leverage the socio-cultural correlations for ad placement, consider that a person interested in organic food is likely to be interested in buying an ecological car,” says Aker. “This is because studies in social behaviour have proven that an interest in healthy diets and a concern with ecology tend to be shared, in most cases, by the same types of people. These relations are not direct, yet they are incisive and effective, and they could not be discovered nor applied by other technologies.”

Aker also notes that companies that use Expert’s Cogito Monitor tool, which uses the vendor’s semantic smarts to help advertisers understand what customers are saying online about their products, can leverage that in conjunction with Cogito Semantic Advertiser to help in campaigns. “Trends and connections between products and socio-economic groups or socio-demographic groups are revealed by Monitor and can be engineered into the target audiences of Advertiser during ad design time,” Aker says.

Cogito Semantic Advertiser is available in Italian and English now. Germany is on the way and other languages are very possible, Aker says.

“For example, our core technology covers Arabic and another Middle Eastern language. We are looking at French and Chinese currently,” he says. Expert System plans to sell Cogito Semantic Advertiser in the $100,000 to $200,000 range, depending on implementation size requirements.

SemantiNet Hits the Internet Stage

Jennifer Zaino
SemanticWeb.com Contributor

Earlier this month, semantic startup SemantiNet announced that it had secured $3.4 million in Series A funding from Israel’s leading venture capital fund, Giza Venture Capital, as well as several private investors. Today, it unveils the first fruits of those efforts.

So, what’s all the buzz about? The browser plug-in, which initially will work with Firefox with an Internet Explorer version due shortly, is designed to help people discover content they are not actively searching for, says founder Tal Keinan. The company has taken advantage of APIs to access the content of sites with appeal to the social web and digital lifestyle crowd, such as Wikipedia, Facebook, Twitter, YouTube, FlickR, and Amazon, encapsulating the source’s data with a semantic layer that describes the information each source provides.

For instance, Amazon provides information about products, and the characteristics of products are price, pictures, ratings, reviews, and reviews are written by people on certain dates. With Facebook, the semantic layer can encapsulate information such as a person’s name, location, birth date, friends’ list, interests, and such. Once the product identifies objects on a page, and a user interacts with it, SemantiNet goes ahead and retrieves all the relevant content around that object, taking into account user interests, tastes, relationships, and so forth.

SemantiNet’s proprietary engine understands how the different pieces of information connect together and automatically creates those connections — contextually driven dynamic mash-ups, if you will. For instance, say you’re looking at a profile page on Facebook of a friend, and it indicates he’s a fan of a particular band; clicking on the band will bring up a bunch of information from the web such as videos, songs, concert dates, and even who you know who might be going to what concert on what date. Keinan says he had that experience himself, and saw through the Twitter connection that someone who he was friends with on Facebook but not on Twitter was attending the same concert he was going to on the same day. “I wouldn’t have known that otherwise,” he says.

That gets to the heart of what Keinan sees as a problem that has grown along with the web.

“We are reaching a point where there is so much information contained in different sites and the level of noise surging around those sources. People understand that there needs to be some change in the way people consume content,” he said.

Some of what SemantiNet aims at doing sounds familiar, and Keinan says he gets the most questions about how it compares to AdaptiveBlue.

Read more

Your Second Space, The Semantic Way

Jennifer Zaino
SemanticWeb.com Contributor

Dreaming of a castle in Spain? A hideaway on Florida’s Gulf Coast? A cabin in the mountains anywhere — as long as it’s far away from civilization?

Apparently, you’re not alone. Recession or not, there’s enough disposable income out there to power the second-home lifestyle, and that’s a lifestyle now being capitalized on by semantic web start-up SecondSpace. SecondSpace operates ResortScape.com and Landwatch.com, as lifestyle matchmaking services for those who want the most accurate information on properties that meet their very specific requirements. Instead of searching for, say, a two-bathroom, four-bedroom home in a specific neighborhood, as you normally would for a primary residence, the semantic search behind SecondSpace lets users search for the second home of their dream by entities and attributes.

An entity is anything SecondSpace can model with attributes around it — a home, for example, can be defined by attributes that include its square footage but also that it is close to a golf course. So, for example, you can look for a 3-plus bedroom condo near Miami Beach with a pool and spa, or over 40 acres in Montana with power available, or 20+ acres in northwestern Montana with a view of the lake, or a home in Westchester, New York, near a golf course.

The technology behind SecondSpace enables it to infer a lot of information about the surroundings and lifestyles associated with properties that wouldn’t normally be catalogued by a realtor or individual listing a property. That information can be critical to purchasers whose primary residence may be very far from the property they want to buy, and who may not know enough about the area to get a comprehensive view of all their options. For instance, they may know they want a location in Florida close to the beach, but if they hail from overseas their experience of Florida may have been limited to a single visit to Miami, and thus it would be difficult for them to pinpoint specifically what area they are looking to buy a home in, says Gary Cowan, director of product management.

“We can bring the relationships between entities to the user in a logical fashion,” says co-founder and CTO Alok Sinha. For example, you can search for a home in Florida — specifically a condo near the beach at a certain price, and get a comprehensive list of properties in South Beach, Amelia Island, Pompano Beach, St. Augustine, and more. Those searches can be further narrowed — for example, you can further specify that you want that property to have access to a tennis court and golfing. “We use semantic search as the best way to surface our properties, articles, and content modules,” says Sinha.

Feeding these relationships is a whole bunch of data, gigabytes worth. Geo-meronymy is an example of its intelligent indexing, using patterns and verified domain data. SecondSpace can look at all the geometric data in the world, so that the latitude/longitude coordinates of each property are matched with other sources of geographic and spatial data. SecondSpace can take just the coordinates of a property and its square footage and supplement it with area information, climate information, and anything else that represents that property and why someone would be interested in it. Currently it is rich in data in the U.S. in this respect, and by August expects to be equally rich in Canada, India and Mexico.

Read more

HealthMash Helps Sort Through Online Health Morass

Jennifer Zaino
SemanticWeb.com Contributor

HealthMash is a new knowledge base that bills its mission as promoting health and well-being by providing relevant information of high quality from trusted health sources on the Web, using sophisticated Web 2.0 universal search and discovery technology with Semantic Web Concepts.

Semanticweb.com recently conducted an e-mail interview with Endre Jofoldi, CEO of
Budapest, Hungary-based WebLib — the developer of knowledge bases and specialized, natural-language processing and search technologies that created HealthMash. WebLib provides natural-language processing tools and semantic engineering services to international clients in the U.S. and Europe, and some of its employees also work as individual contractors for the U.S. National Library of Medicine. Its customers include government agencies, universities, and major content providers in the U.S. WebFeat.org, the federated search engine vendor, licensed our clustering technology.

Its technologies, which include its English, medical and, web spellchecker PolySpell, and its clustering engine PolyCluster (available for licensing as separate products), are showcased in PolyMeta.com and the AllPlus.com universal meta-search and discovery engine.

Semanticweb.com: What was the impetus for creating HealthMash?

Jofoldi: It was our personal experience with how difficult it is to find relevant health information of high quality from trusted sources on the Web. In addition, we have been aware of the increasing popularity and importance of consumers looking for health info on the Internet. Fortunately, we have considerable professional expertise in health information retrieval and knowledge bases, so we felt motivated to do something new and useful in this area.

Semanticweb.com: What is your vision for this offering — why, for instance, do you foresee it as more useful/better/differentiated from something like WebMD or other sources of health information?

Jofoldi: There are a lot of good sources of health information on the Web, including WebMD, Mayo Clinic, and Medlineplus, to mention a few. There are even more questionable and/or outright dangerous sources of information, and often the major search engines mix the good and the bad data in their search results. When it comes to one’s health, “second opinions” are very important not only when consulting with doctors, but also in accessing multiple sources of reliable information. By combining a focus on quality information sources with a comprehensive semantic health knowledge base and Web 2.0/Web 3.0 universal meta-search and discovery, we hope to make the best unbiased and personally relevant health information available to people.

Read more

The Semantic Question: To Delete or Not To Delete

John Clarke Mills
SemanticWeb.com Contributor

A few months back I posed a question to the folks at DERI (Digital Enterprise Research Institute) from the University of Ireland when they came to visit Radar Networks. This is a question that I have struggled with for a long time; seeking for answers anywhere I could. When I asked them, I heard the same question in response that I always hear.

Why? Why would you ever want to delete something out of the ontology?

Ontologies were not originally created to have things removed from them. They were created to classify and organize information in a relational format. Just because a species goes extinct it doesn’t mean it should removed from the domain of animals does it? Just like a car that isn’t manufactured anymore or a disease that was officially wiped out. These and many more are probably the reasons why Semantic Web gurus and ontologists alike don’t like the idea of deleting entities.

I am helping to create a social site where users generate content; objects, notes, data, and connections to others and other things that are their own. If they want to delete something that they have created, so be it. Sounds easy right? Well, yes and no. This problem is dealt with throughout computer systems. It is essentially all about managing pointers in memory. You can’t delete something that other things are pointing to. Who knows what will happen or how our application will respond when certain pieces of information are missing? Some things we account for on purpose because they are optional — but some things we just can’t. Every application has its unique constructs, whether it is built on a relational database or a triple store.

So what I have to do is define ontological restrictions, stating what links can and cannot be broken. On top of that, we must worry about permissions. Am I allowed to break an object link from someone else’s object to my own? Also, what if the object being deleted has dependent objects that cannot exist alone, or more importantly, don’t make sense on its own? A great example of this is a user object and a profile object. There should either be zero or two, never just one.

My friend and coworker Jesse Byler had dealt with a similar problem in the application tier a few months back regarding dependent objects. He had written a recursive algorithm that would spider the graph until it hit the lowest leaf node matching certain criteria and then begin to operate. I took this same principle and pushed it down into our platform and began to mark restrictions in our ontology.


John Clarke Mills is an application engineer at San Francisco startup Radar Networks, attempting to bring the Semantic Web to life with their first commercial product, Twine.com. Twine is a new service that helps you organize, share and discover information about your interests, with networks of like-minded people. Before coming to Radar, John began his career as an engineer for CNET Networks.

Read more

E-Discovery Gets Boost From Semantics

Jennifer Zaino
SemanticWeb.com Contributor

Semantic technologies have a role to play in the e-discovery space, potentially saving companies a lot of money and headaches when it comes to producing required information that’s in electronic form for the courts. Ten years ago, if information wasn’t on paper it didn’t really exist from the standpoint of evidence at trial. Things are dramatically different today, with the explosion in electronically stored data.

That leaves many companies grappling with the issues of determining how to preserve data when there is a duty to do so, how to collect it in ways that maintain its authenticity and integrity in terms of being able to introduce it into evidence at a later point, and, once it’s collected, how to identify data that is relevant to a particular matter, and produce it in the format requested by the opposing party.

E-discovery solutions provider Fios recently enhanced its e-discovery services with Content Analyst’s CAAT conceptual search and analytical software to help clients search and analyze large amounts of electronically stored information during review and in the early stages of e-discovery planning. The goal is to make the e-discovery process more efficient, enable organizations to meet timeframes for producing relevant data, and reduce their risk (for example, the risk of potentially exposing themselves to consequences by producing data they really don’t have to).

Fios actually introduced the idea of concept-based searching in 2003 through its partnership with intelligent search vendor Engenium — since acquired by Kroll Ontrack — but CAAT adds some additional capabilities around scalability and efficiency.

“In terms of scalability and performance, its speed of search results, speed of indexing, maintaining the index — those things allow us to continue to offer similar capabilities but more efficiently, robustly, with faster search results,” says Brad Harris, director, Discovery Center of Excellence, Fios Consulting. “In addition, longer-term, Content Analyst also has capabilities around concept-clustering, the way you can manipulate the information once you have a conceptual space built in order to better understand and provide better analytics around a population of documents.”

Its concept-based searching approach is based on latent semantic indexing (LSI) or analysis, a natural language processing technique to discover relationships among documents and the terms and words in them by producing a set of concepts related to the documents and terms. Within a group of documents, some terms may seem to be highly related to one another in context, says Harris; for example, a company might use an abbreviation and a code name to refer to the same project.

Read more

Excel and Semantic Web Unite

Jennifer Zaino
SemanticWeb.com Contributor

Spreadsheets, meet the Semantic Web.

In May, Semanticweb.com reported on a challenge issued by Brand Niemann of the Environmental Protection Agency about “semantifying” the wealth of government data locked away in spreadsheets. That’s a challenge that has resonance to Brian Donnelly, founder and CEO of In Silico Discovery, which has developed the Semantic Discovery System (SDS).

Niemann, a pioneer in the field of enterprise architecture integration as the founder of Constellar Corp. (which was acquired by IBM), says his vision is being “the first company to bring the semantic web to the desktop.” The desktop is where Excel reigns, and also where businesses come face to face with the problem of trying to integrate large volumes of data that exist in separate worksheets to be able to do sophisticated querying around that information.

The Semantic Discovery System, which has been in beta, is set to become generally available as a download next quarter. It has its heritage in the work Donnelly has done with large life sciences organizations, in the area of drug discovery.

“There’s a theme that says spreadsheets are not going away,” says Donnelly. “So we need to control them. In the drug discovery world, everyone uses spreadsheets, and they discover really valuable things with them. But you end up cutting and pasting all over the place, and you need a better way of linking them together, to query between the spreadsheets.”

And, the company believes, you need a way to create those queries and present the results, using a powerful engine that utilizes semantic web standards (OWL, SPARQL, RDF) under the hood, in an easy-to-digest graphical interface. “You don’t have to write joins or other things; you can ask questions of your data without any programming. Just let SPARQL do the work in the background,” Niemann says.

Read more

Researchers Focus on Semantic Web Incentives

Jennifer Zaino
SemanticWeb.com Contributor

What needs to happen to bring about the Semantic Web on a broad scale? The organizers of an upcoming workshop, Incentives for the Semantic Web, to be conducted at the 7th International Semantic Web Conference in October in Karlsruhe, Germany, will be exploring that question.

Semanticweb.com recently conducted an interview by email with the conference organizers: Katharina Siorpaes and Elena Simperl, researchers at the Semantic Technology Institute (STI), and Denny Vrandecic of the Institut AIFB, Karlsruhe, Germany.

Q: What prompted you to organize a workshop on this topic?
Are you concerned that the semantic web is developing too slowly, and if so,
what do you believe has held it back and kept people from contributing to
semantic content creation?

A: We have been working on this topic for several years. In OntoGame [which weaves the Semantic Web into online, multi-player game scenarios], for instance, we investigate how tasks related to the creation of semantic content, which rely heavily on human input, can be hidden behind cooperative online games. Another example is Semantic MediaWiki, a wiki-based platform that allows the creation of semantic data in a simple and minimally invasive fashion. The created data can be immediately used within the platform and brings visible benefits to the system user.

The principle underlying such approaches is that semantic technology must not only be easy to use, but also rewarding: people need a clear incentive to invest time to building ontologies or annotating content semantically. The technology has matured in the last years, but its adoption is possible only if a critical mass of content is available. And one promising way to achieve this is by making the technology accessible to and appealing for a broad audience.

Q: The information about the workshop makes a contrast between the
success of web 2.0 and the relative lack of success about the semantic web.
Can you describe what you think made so many Web 2.0 apps a success and how
that reflects on what is lacking around the semantic web?

A: The majority of the applications we call 2.0 are easy to use and offer an incentive for the users, be that the need for affiliation to a community, entertainment, competitive spirit, peer recognition, or reciprocity. From a technological perspective, Web 2.0 did not bring many new things, while the Semantic Web is based on a complex, innovative technological stack. What we can learn from Web 2.0 is how to motivate people to contribute to semantic technology and how to make its benefits visible to its adopters. This is, however, not limited to “Web 2.0 incentives.”

Q: At this point, where would you have expected — or hoped — the world to be in terms of the development of semantic applications of immediate added value, and for the adoption of semantic technologies at the industrial level? How far do you believe we actually are from that hope/expectation?

Read more

NEXT PAGE >>