Archives: April 2008

Freebase Reaches Out

Jennifer Zaino
SemanticWeb.com Contributor

Metaweb, the creators of Freebase — billed as “an open, shared database of the world’s knowledge” — has hired a new director of community. A big part of her role will be to grow and renew community involvement to support that mission.

“We have this solid backend, lovely web site, an API that is pretty stable, so the platform is there,” says Kirrly Robert, recently transplanted from Australia to California to take on the challenge. “Phase two, which we are now entering, is to build up the data and community. This is a database of the world’s information — we want to have everything in there.”

“Shoveling it in there takes a lot of people. We can’t do that as a company on our own — we need the world to join in with us,” Robert says.

As Freebase prepares to enter the beta stage, the small, tight-knit team that has built Metaweb has to move from being heads-down working on the platform to “opening it up and talking to the whole world,” she says. “That’s changed the focus and importance of the community aspect of things.”

“Bootstrapping” is the word Robert uses to describe her goals. For every piece of data Freebase puts into its database, it gets one piece of data back. That’s nice, but she’s looking for the snowball effect of crowd involvement.

“What can we do that causes five people to get enthused to do something else that causes 25 people to get enthused. So, instead of us shoveling data in, we want to organize data mobs around topics that may be interesting or controversial or have some personal applicability to people. Then if you can get a member of the community to edit one topic, next they are clicking to another. It’s like when three hours pass while you’re on Wikipedia and how did you get there. We want to trigger that behavior and get the community to support each other to build a rolling ball of enthusiasm.”

How so? For starters, Robert sees that the community is already broad as well as overlapping, extending from those who just casually drop by to find a fact (such as how Franklin Delano Roosevelt died) to those who develop against the Freebase API, to those who want to share use the platform to build a community around their domain-specific interests (be it vintage motorcycles or tofu recipes), to those who are Creative Commons fans and the idea that information should be free.

Read more

AdaptiveBlue Publishes New Markup Format

Jennifer Zaino
SemanticWeb.com Contributor

Smart browsing vendor AdaptiveBlue is adding to the ranks of markup technologies, with the introduction this week of AB Meta, an open format for annotating objects in pages about things.

The format, developed in collaboration with a large web company that AdaptiveBlue cannot yet disclose, is distinguished by its simplicity and human readability, according to the vendor. AdaptiveBlue also notes that its unnamed partner will soon be announcing this format to be part of a broader set of semantic markup technologies. One clue to the type of company that partner may be is that, when asked about search engine support for the AB meta declaration, CEO Alex Iskold notes that people will want to keep an eye out for the upcoming announcement.

Some details about the new format: Using the standard meta tags inside of the head element of an HTML page, AB Meta lets site owners describe the primary object on the HTML page in a very simple way, AdaptiveBlue says.

“Meta with English-like attribute names is as simple as it can be,” says CEO Alex Iskold. “The limitation is that only object properties can be expressed but not relationships.”

A page featuring a book, for example, can specify within meta name tags the object type content as a book, the book title, the book author, the ISBN number, year of publication, and so on. The format is based on the eRDF standard, a syntax of RDF that can be embedded into XHTML or HTML by using common idioms and attributes, but adds more specificity around it. Because it can leverage existing formats and vocabularies, such as Dublin Core, according to AdaptiveBlue, many attributes can be described either using the AB Meta name or by using a name from an existing standard, it says. As long as the object.type meta header is present, they should be treated the same way by search engines.

What would the adoption of this format by the public at large mean to AdaptiveBlue?

“Wider adoption would mean that BlueOrganizer would recognize things on more pages,” says Iskold. “The current version of the product already contains support for AB Meta. This is also a big benefit to search engines, as they can return more precise results. So overall for publishers this means that the content is more discoverable.”

AdaptiveBlue recommends that publishers who use the WordPress software for their blogs can use the HeadMeta plug-in to specify meta headers.

The Semantics of the the Dublin Core – Metadata for Knowledge Management

In my previous article, I proposed that the library catalogue could be used as a blueprint for the Semantic Web. Perhaps theoretical and conceptual, the arguments fleshed out the ideas, but not the practical applications. For this article, I will outline in greater detail how exactly, developments in library and information science are playing out, not only in the SemWeb, but also in knowledge management for business.

Read more

Semantic Web Project Supports DoD Efforts

Jennifer Zaino
SemanticWeb.com Contributor

The US Space and Naval Warfare Systems Center Charleston is
leading an effort called aXiom, which it expects will advance the promise of
service oriented architectures, thanks to the use of semantic web
technologies for information accessibility and application integration.

The
ultimate goal: applying semantic technologies for data management, as well
as providing end users the ability to create their own knowledge models, in
order to improve situational awareness.

Technology solutions provider CommIT Enterprises Inc. is supporting the effort, which will include a set of open source and commercial tools to build a reusable architecture that can be leveraged across different projects within aXiom’s scope. Today, Thetus Corp. announced that CommIT is using the Thetus Publisher knowledge-modeling tool to apply context to data.

The context-based nature of the framework — in which ontologies, inference engines, and rules engines allow mapping between concepts and data elements to support the discovery and analysis of new information and non-obvious relationships — ensures that the correct data is presented to the appropriate user at the right time, according to Thetus. That is critical for having the flexibility to respond to changing environments, both in terms of mission types and new quantities of information inside and outside the DoD.

“Because we support agencies doing analysis of real-world events and missions, the sources of information they have to consume are changing all the time,” says Cameron Hunt, CommIT context architect.

Take, for example, evaluating sources of information on commercial shipping, and trying to evaluate the threat that ships coming into ports may pose. Various data sources might only provide information of the vessel name, the last ports it was in, the country where it originates, and the companies affiliated with it.

Read more

Difficulties with the Classic Semantic Approach

Alex Iskold
SemanticWeb.com Contributor

The original vision of the semantic web as a layer on top of the current web, annotated in a way that computers can “understand,” is certainly grandiose and intriguing. Yet, for the past decade it has been a kind of academic exercise rather than a practical technology. This article explores why; and what we can do about it.

The semantic web is a vision pioneered by Sir Tim Berners-Lee, in which information is expressed in a language understood by computers. In essence, it is a layer on top of the current web that describes concepts and relationships, following strict rules of logic.

The purpose of the semantic web is to enable computers to “understand” semantics the way humans do. Equipped with this “understanding,” computers will theoretically be able solve problems that are out of reach today.

For example, in a New York Times article, John Markoff discussed a scenario where you would be able to ask a computer to find you a low budget vacation, keeping in mind that you have a 3 year old child. Primitively speaking, because the computer would have a concept of travel, budget and kids, it would be able to find the ideal solution by crawling the semantic web in much the same way Google crawls the regular web today.

But while the vision of a semantic web is powerful, it has been a over a decade in making. A lot of work has been done at the World Wide Web Consortium (W3C) specifying the pieces needed to put it together. Yet, for reasons ranging from conceptual difficulties to lack of consumer focus, the semantic web as originally envisioned remains elusive. In this post, we take a deeper look at the issues and wonder if the classic bottom-up approach can ever work.


Classic Semantic Web Review

In our post earlier this year, The Road to the Semantic Web, we discussed the elements of the classic semantic web approach. In a nutshell, the idea is to represent information using mathematical graphs and logic in a way that can be processed by computers. To express meaning, the classic semantic web approach also advocates the creation of ontologies, which describe hierarchical relationships between things.

For example, using such ontologies it would be possible to express truths like: dog is a type of animal or Honda Civic is a type of car. It would then also be possible to describe the relationships between things like this: dog is eating food and John is drivng a Honda Civic. By combining entities and relationships and expressing all content on the web in such a way, the result would be a giant network, or, the semantic web.

The W3C has mapped out a set of tools and standards that are needed to make it happen, two of which are the XML-based languages RDF and OWL that are designed to be flexible and powerful. To accommodate for the distributed nature of semantic web, documents are made self-describing – the meta data (meaning) is embedded in the document itself. The entire stack, as it was envisioned by Sir Tim Berners-Lee, was presented in 2000 (see image below), the rest of the post will focus on the difficulties with this approach.


The Technical Challenges

1. Representational Complexity: The first problem is that RDF and OWL are complicated. Even for scientists and mathematicians these graph-based languages take time to learn and for less-technical people they are nearly impossible to understand. Because the designers were shooting for flexibility and completeness, the end result are documents that are confusing, verbose and difficult to analyze.

2. The Natural Language Problem: People argue that RDF and OWL are for machines only, so it does not matter that people might find them hard to look at. (Though as a side note, the advantage of XML representation is precisely that people can look at it, mainly for debugging purposes.) But even assuming that RDF and OWL are for machines only, the question arises: how are these documents to be created?

Read more

Semantic Web Podcast: Zepheira’s Eric Miller

Paul Miller
SemanticWeb.com Contributor

In this podcast, Paul Miller talks with Eric Miller, president of
Zepheira.

He discusses a project Zepheira has been undertaking to simplify conference management and enrich the delegate experience at next month’s Semantic Technology Conference in San Jose, Calif. The systems they have developed demonstrate some of the ways in which semantic web technologies can be integrated with existing processes in order to deliver increased value and functionality.


Listen Now

More Semantic Web Podcasts



Introducing the Semantic Web Gang Podcast

The regular monthly podcast will tap into the insights on the news of the moment from some of those at the forefront in bringing the Semantic Web vision to reality. Gang members for the first show are:

  • Greg Boutin
  • Mills Davis of Project 10X
  • Tom Heath of Talis
  • Alex Iskold of AdaptiveBlue
  • Daniel Lewis of OpenLink
  • Thomas Tague of Reuters

  • We shall be adding to the gang in the coming months, as well as introducing the occasional special guest from time to time. Check back for next month’s podcast.

    Listen Now

    This conversation was conducted on March 20. For further Talking with Talis podcasts on the emerging Web of Data, click here.


    At Talis, Paul Miller is active in raising awareness of new trends and possibilities arising from wider adoption of the Semantic Web.

    Everyzing Tools Tag Video, Audio Content

    Jennifer Zaino
    SemanticWeb.com Contributor

    Everyzing today announced a management console for its ezSEARCH and ezSEO search optimization products. Both are software-as-a-service offerings that provide a foundation for big media companies to merchandise their media. They derive high-quality, time-stamped text output from audio and video content, tagged and indexed so that the metadata, named entities, and key concepts of these assets are discoverable on a corporate web site (ezSEARCH) or via a search engine (ezSEO).

    CEO Tom Wilde talks about helping media companies build out their internal semantic webs to better participate in the web and search economies. That, he thinks, will be key to their ability to participate in the semantic web, at their own discretion and in their own good time.

    “In talking to CTOs and heads of digital at big media companies, they are aware of the semantic web. I would venture to say very few of them can articulate its value. It’s still sort of an inside-baseball kind of thing,” says Wilde.

    Right now, they are more interested in how to monetize their online multi-media assets through advertising opportunities. “But we pay close attention to it, because we know our customers will need it in a couple of years. And when they say there’s a new microformat for video news, we will say, ‘Here’s a translation application, so off you go.’”






    More From Jupitermedia

    Young Guns Driving Semantic Web (Part 1).

    Military, Universities Team Up on Big CALO Project

    Radar’s Twine Ties the Semantic Web Together

    A Snapshot of Semantic Web Trends

    Smartening Up Your Links

    If you want to comment on these or any other articles you see on Intranet Journal, we’d like to hear from you in our IT Management Forum. Thanks for reading.

    - Tom Dunlap, Managing Editor.

    Today, Everyzing’s applications intersect with the semantic web in consistently marking up multi-media content with text, tags, and some degree of ontological organization.

    “You’ve got large media conglomerates like NBC or Fox — they all have these corporate entities with dozens of brands, and under those brands there may be dozens of properties, which may have dozens of collections,” Wilde says. “At the bottom of this you have lots of content, in different formats, and it’s not marked up in any consistent fashion. The challenge is simply finding a way to distribute and repurpose their content within their own properties.”

    After Everyzing’s products do their work, he says, you end up with a powerful layer across the whole enterprise that knows the semantic aspects of its content, and can exploit them.

    “Once you have that, then translating it into a Semantic Web standard is trivial. All that is is a translation engine,” he says. The question, though, is at what rate do Semantic Web standards and microformats shake out for primetime use by the industry. “Giving them the power to create their own semantic representation of their content, and then letting them choose when and how to interface with standards on the web, is exactly what they need.”

    With the template-based RAMP (Reach, Access, Monetization, and Protection) management console, media companies get a centralized platform for managing the metadata for their audio and video assets. The metadata also can be used to create rules that request ads from different sources, depending on events.

    “Let’s say the publisher has a deal with a content provider who needs an ad of a certain source,” Wilde says. “If the content comes from this collection and mentions this word, then use the following ad tags and call this partner’s Doubleclick dart server. You can do this through the console — or change the rules and update new ad tags. This populates to all templates immediately.”

    RAMP, Wilde says, rounds out the solution set. It will be included as a standard feature on both products, whose prices are based on a monthly capacity license dependent on how much content Everyzing processes for its customers and how many actions they host for them. “A lot of content is lost because it is not well tagged,” he says. “We say we will give you a scalable solution, so all content will be discoverable — and so you have the controls to edit and modify it if you need to.”

    Dapper Gets Semantically Aware with Semantify

    Jennifer Zaino
    SemanticWeb.com Contributor

    There’s a new service built on top of the Dapper platform, a web-based interface that enables web site publishers and users to create structured feeds that can extract and reuse content from any web page.

    The new service is dubbed Semantify. Now in beta, it works with Dapper to let users create feeds, called Dapps, whose fields are based upon formats such as RDFa, eRDF and various microformats, and then embed the code onto their web site pages. The semantically rich version they create of their web sites can then be indexed properly by semantically aware search engines — specifically, Yahoo’s new Slurp crawler. Users trolling via the Slurp crawler will get a slightly modified version of the web page — the page plus the semantic markup — while those using non-semantic search engines will retrieve the page in its “normal” state. While the service currently works exclusively with Slurp, Dapper says it could easily be extended to support other semantic search engines.

    Why? According to Dapper co-founder and CTO Jon Aizen, exposing site semantics to the search engine should result in improved search engine optimization and in higher-quality traffic.

    More From Jupitermedia

    Young Guns Driving Semantic Web (Part 1).

    Military, Universities Team Up on Big CALO Project

    Radar’s Twine Ties the Semantic Web Together

    A Snapshot of Semantic Web Trends

    Smartening Up Your Links

    If you want to comment on these or any other articles you see on Intranet Journal, we’d like to hear from you in our IT Management Forum. Thanks for reading.

    - Tom Dunlap, Managing Editor.

    “The primary advantage is that because search engines can understand your content better, they can expose it to users with more relevancy and with more targeting,” Aizen says. As an example, owners of diet cooking or recipe sites could more easily get hooked up with traffic that is looking specifically for recipes under 200 calories — vs. just being directed to sites generically featuring “low-cal” recipes that may come up because some publisher thought to optimize that as a keyword.

    “But if I expose calorie information, the semantic search engine can find that. Or, users could also search specifically for chocolate as an ingredient rather than as a product. Adding structured semantics lets people drill down more specifically, and then you don’t have to worry about how you’ve optimized specific keywords; you just produce the web sites, and the search engines will provide the tools for users to find what they are looking for, and so your site may be exposed more frequently,” Aizen says.

    And it may be exposed to more relevant users who want the content you have. “Just getting traffic on your site — you get a low-value from that,” he says.

    Aizen thinks that what Yahoo has done is going to have a big impact, over time. “It’s not that people couldn’t use semantic versions of web sites before. The real question was, why bother? There was no incentive to invest time or money, because what benefit would you see. Maybe some eclectic person would use your feed, but now this provides a very clear incentive,” he says.

    Read more

    Semantic Integration – The SI of Tomorrow

    Currently when people use the acronym “SI” they tend to refer to something known as ‘Systems Integration’ or specifically to ‘Systems Integrators.’ However, the nature of what a ‘system’ is and how that concept is evolving are going to change the way that we look at this particular term in the relatively near future. I predict that within the next ten years, ‘SI’ in the context of information systems technology will primarily refer ‘Semantic Integration.’

    Read more

    Web 2.0 Is Not the Enemy

    Uche Ogbuji
    SemanticWeb.com Contributor

    Ah, version number battles. Whether in office application wars or DBMS wars, version number one-upmanship is a staple of the industry.

    The RSS wars opened up a new popular front, using version number tactics to vie for legitimacy as the standard for Web feeds. Going even more “meta,” the “Web 2.0″ concept emerged to near media hysteria, and one faction of Web experts felt a little left in the cold. You maybe have heard the term “Web 3.0″ emerging from semantic Web technologists. I do hope this bit of version one-upmanship isn’t an act of war. Here’s one clear case when it’s worthwhile to give peace a chance.

    “Think globally, act locally” was the famous saying of flower-child activists. The semantic Web is not a new idea. Almost as soon as the Web started blooming, its creator, Tim Berners-Lee, started looking for a better way to organize its information. The idea did not take off as dramatically as the Web itself, in part because there was a perception, right or wrong, that Web publishers had to learn complicated techniques to adapt their sites to a centrally planned Web world view. People started to assume that the semantic Web meant that they had to act globally.

    More From Jupitermedia

    Young Guns Driving Semantic Web (Part 1).

    Military, Universities Team Up on Big CALO Project

    Radar’s Twine Ties the Semantic Web Together

    A Snapshot of Semantic Web Trends

    Smartening Up Your Links

    If you want to comment on these or any other articles you see on Intranet Journal, we’d like to hear from you in our IT Management Forum. Thanks for reading.

    - Tom Dunlap, Managing Editor.

    “Web 2.0″ came along to drive changes in convention, not technology, because there is almost no new technology in Web 2.0. Much of what it entails is thinking about how a Web site’s resources can be used globally, in ways that the webmaster might not even be able to anticipate. Then you use well-established technology to make local changes on your site, adapting it for global usage. Not much different from what semantic Web was really advocating, but Web 2.0 had the benefit of emerging more gradually, through a series of developments each of which a Web publisher could pick up and apply in a day.

    The cornerstones of Web 2.0 are Web feeds (RSS, Atom, and even Javascript object feeds), user-generated content, and mashups. Web feeds and user-generated content are generally the read and write aspects of a modern Web API. Rich Internet Application (RIA) technology such as AJAX is closely associated with Web 2.0, but there is some controversy as to whether it belongs in the big tent, with many observers seeking to separate the two more clearly. RIA is more of an incentive — the visible reward for improving a site’s Web architecture.

    The good news is that Web 2.0′s core ideas each represent a small step toward the semantic Web. Small steps don’t satisfy everyone, hence the “Web 3.0″ play to upstage the upstart — and also to ride some of its marketing coattails. There is some technical merit to the challenge, but there is also the danger that version wars merely induce the audience, Web publishers in this case, to roll their eyes and tune out. I think it’s important to acknowledge and understand how far Web 2.0 gets us towards a better Web so we can train a clear eye on where and how to target further advocacy.

    Read more

    NEXT PAGE >>