Archives: January 2008

Europe Leads in Semantic Web Research (Part 2)

Jennifer Zaino
SemanticWeb.com Contributor

Europe is investing aggressively in semantic web research (see Follow the Money, Part 1), but it’s not necessarily those who lay the groundwork that are the most effective at commercializing their efforts. This time, though, things might be different.

Government funding seldom translates into commercial products, points out Dr. Chris Harding, the U.K.-based forum director for SOA and semantic interoperability at The Open Group, a vendor- and technology-neutral consortium committed to enabling access to integrated information within and between enterprises based on open standards and global interoperability. Success will come to those who can match technical possibility with commercial requirements.

More From Jupitermedia

Nominate Your Picks for Product of the Year

Military, Universities Team Up on Big CALO Project

Radar’s Twine Ties the Semantic Web Together

A Snapshot of Semantic Web Trends

Semantic Web as Competitive Advantage

Smartening Up Your Links

If you want to comment on these or any other articles you see on Intranet Journal, we’d like to hear from you in our IT Management Forum. Thanks for reading.

- Tom Dunlap, Managing Editor.

“So far, we only have a fuzzy picture of both of these, but it’s becoming clearer,” Harding said. “The first companies to see it really clearly and match the technology to the commercial requirement will gain significant advantage.”

Many European leaders in semantic web research, in fact, will readily admit that the U.S. historically has had the edge in terms of carrying ideas through and realizing them in successful commercial ventures.

“I don’t think historically Europe has been very good at anything compared to the U.S. in terms of commercialization,” says Dr. John Domingue, deputy director of the U.K.’s Open University’s Knowledge Media Institute, which carries out research related to the creation and sharing of knowledge, with one of its big research topics being the semantic web.

Peter Mika, a researcher at Yahoo! Research in Barcelona and co-chair of the 2007 International Semantic Web Conference Semantic Web Challenge, has his thoughts about which commercial ventures are poised to first make a go of things. “My suspicion is that there is more business in the U.S. and more research in Europe,” he says. “The startups that are positioned to have a web-scale impact (Twine, Freebase) are American.”

The U.S. indeed has a much superior culture for starting companies, says Dr. Mark Greaves, director of Knowledge Systems Research at Vulcan at Vulcan, Inc., the private investment vehicle for Paul Allen, where he sponsors advanced research in large knowledge bases and advanced web technologies. And while Europeans acknowledge this, he thinks they’re ready to play a little hard ball this time around.

Read more

Follow the Money, Part 1

Jennifer Zaino
SemanticWeb.com Contributor

To put a semantic web twist on a memorable line in 1967′s The Graduate: “I want to say one word to you. Just one word: Europe.”

That’s where so much of the research action is when it comes to the semantic web, at least these days. It wasn’t always that way. The U.S. federal government, through DARPA’s (Defense Advanced Research Projects Agency) DAML (DARPA Agent Markup Language) project, built core technologies underpinning the semantic web, with the help of some $45 million U.S. taxpayer dollars, and in close cooperation with European Union-funded semantic web projects. Thus were what are essentially the standards for the semantic web born, as well as many of its tools, initial ontologies, editors, and so forth.

More From Jupitermedia

Nominate Your Pics for Product of the Year

Military, Universities Team Up on Big CALO Project

Radar’s Twine Ties the Semantic Web Together

A Snapshot of Semantic Web Trends

Semantic Web as Competitive Advantage

Smartening Up Your Links

If you want to comment on these or any other articles you see on Intranet Journal, we’d like to hear from you in our IT Management Forum. Thanks for reading.

- Tom Dunlap, Managing Editor.

But DAML’s job is to show that something is possible, in this instance “essentially to show that we could treat the web as a database and do it at scale,” says Dr. Mark Greaves, one of three DARPA program managers for DAML and is now director of knowledge systems
research at Vulcan, Inc., the private investment vehicle for Paul Allen, where he sponsors advanced research in large knowledge bases and advanced web technologies. “Their job is not to fund the sort of massive amounts of development that would be needed to take those basic ideas proven in DAML into commercial reality.”

Long-term research is funded by the NSF (National Science Foundation), which has supported a thread of semantic research for a long time, Greaves says. These include such efforts as the Marine Metadata Initiative, TANGO (Table Analysis for Semiautomatic Generation of Ontologies), and Scalable Querying and Mining of Graphs. Larger sources of federal funding are the Defense Department, in service labs such as the army research lab, or the office of naval research.

“So those labs, if they see value for their particular branch of military service, provide lots of money to mature technologies,” says Greaves. “That hasn’t seemed to happen for the semantic web, so you don’t see a lot of defense department R&D and technology maturation money swinging behind the semantic web.”

Greaves characterizes the amount of federal funding for the semantic web as modest, guesstimating it to be about $10 to 15 million a year, though he is careful to note that, since he is no longer in federal government service, it is difficult to track this with complete certainty. Contrast that to Europe, where Greave’s back-of-the-envelope calculations figure that in the neighborhood of about $50 million Euros a year in public funding from the European Commission gets spent on semantic web research.

In the Sixth Framework Programme of the European Community for research and technological development, which ran 2002 to 2006, Greaves counted 17 semantics-related IT programs. Semantics is, in fact, just a small fraction of what the Seventh Framework Programme of the European Community for research and technological development for the period 2007 to 2013 is spending overall on information and communications technology of all kinds — over $1 billion dollars a year. “Our government doesn’t spend anywhere near that amount,” he says. And, Greaves also points out, Europe is the site of two large dedicated multi-site institutes for semantic web research, DERI and the new Semantic Technology Institute International.

Read more

The Web Will SPARQL

Jennifer Zaino
SemanticWeb.com Contributor

And now we have SPARQL. The semantic web query language this week became official with three SPARQL recommendations, thus helping to round out the core technologies that are needed for realizing the vision of the semantic web. (SPARQL is short for Simple Protocol and RDF Query Language.)

The other technologies are the Resource Description Framework (RDF), which provides a standard for making statements about resources in the form of a subject-predicate-object expression; the Web Ontology Language (OWL) for building vocabularies; and the Gleaning Resource Descriptions from Dialect of Languages (GRDDL) for automatically extracting data from semantic documents. The latter became a W3C recommendation in September.

With the publication of SPARQL, the stage is set for wider adoption of semantic web standards.






More From Jupitermedia


Nominate Your Picks for Product of the Year

Semantic Web as Competitive Advantage

Military, Universities Team Up on Big CALO Project

Radar’s Twine Ties the Semantic Web Together

Smartening Up Your Links

If you want to comment on these or any other articles you see on Intranet Journal, we’d like to hear from you in our IT Management Forum. Thanks for reading.

- Tom Dunlap, Managing Editor.

“Trying to use the semantic web without SPARQL is like trying to use a relational database without SQL,” said Tim Berners-Lee, W3C Director, in a statement announcing the news. “SPARQL makes it possible to query information from databases and other diverse sources in the wild, across the web.”

The three SPARQL Recommendations introduced by the W3C RDF Data Access Working Group are:

  • SPARQL Query Language for RDF, which is designed to meet the use cases and requirements identified by the RDF Data Access Working Group in its RDF data access use cases and requirements. These use cases, which according to the W3C describe a user-oriented context in which the RDF query language or protocol or both are used to solve a real problem, make for some interesting reading for anyone who wants to understand practical implications of the semantic web in the real world. On the W3C page devoted to this topic, you can read how, for example, you could have your Bluetooth-equipped car query public RDF storage servers on the Web for a description of current road construction projects, traffic jams, and roads affected by inclement weather.

    In combination with a mapping program in your cell phone, the data retrieved from the servers could be used to plan a different route to work and cut your commute time. At a business level, there are supply chain management applications, outlined by the example of a motorcycle dealer that could query its parts database to ask about a defective part and receive back a human-readable description of the part, which provides enough information to obtain a replacement part but also tells her about other, dependent parts that must be replaced at the same time.

  • SPARQL Protocol for RDF, which the W3C says uses WSDL 2.0 to describe a means for conveying SPARQL queries to an SPARQL query processing service and returning the query results to the entity that requested them.

  • SPARQL Query Results XML Format, for the variable binding and boolean results formats provided by the SPARQL query language for RDF.

  • There are already 14 known implementations of SPARQL, according to the W3C. Many of them are open source.

    Enterprise 3.0: Semweb Commercialization Options

    Back when I was an industry analyst (VP, E-Business Strategies at the META Group, since acquired by Gartner), I often had to critique emerging markets.  Unlike venture capitalists, industry analysts are privy to product roadmaps from publicly-traded companies, including the industry giants (Oracle, SAP, Microsoft, IBM).  And unlike i-bankers, they are privy to product roadmaps from start-ups.  And as a kicker, some analysts (actually, only those with the largest firms; back then, primarily limited to those analysts with Gartner, Forrester, META and Giga) get a lot of great feedback from CIOs and other end users.

    Read more

    Reasoning About Semantics

    Fred Wild
    SemanticWeb.com Contributor

    Semantic web concepts carry quite a bit of enthusiasm and hope: the hope is that semantic web ways and means can help us make sense of the vast ocean of resources out in the Internet, or perhaps make sense of our smaller seas of resources within our corporate data centers.

    Topping the list of “things the semantic web is supposed to provide” is context sensitive search. Now, I purposefully did not say “semantic search, ” simply because I want to describe how to reason about semantics and so I need to use other terms. I chose to use the term “context” to illustrate how semantics can be applied. Think of ontologies (which describe types of things — or resources — and their properties) as a way of establishing a context. If you adopt a semantic context, the things you find when you search, as well as the properties you uncover about the things you find, belong to that context.

    As such, you can think about the semantics used in searching as a sort of lens through which you see matching resources. Which is to say: without a semantic context, documents are just documents, undistinguished from each other in any contextual way. Perhaps you can search for documents by filetype and perhaps also limit the search to those files containing certain words, but this is a lexical search, not a semantic one.

    More From Jupitermedia

    Military, Universities Team Up on Big CALO Project

    Radar’s Twine Ties the Semantic Web Together

    A Snapshot of Semantic Web Trends

    Semantic Web as Competitive Advantage

    Smartening Up Your Links

    If you want to comment on these or any other articles you see on Intranet Journal, we’d like to hear from you in our IT Management Forum. Thanks for reading.

    - Tom Dunlap, Managing Editor.

    To understand the difference, consider that when I search for a dwelling in a certain price range, I want to see all of the things that mean dwelling (home, dwelling, house, residence, condo, cottage, townhouse, …). An ontology describing Real Estate might establish an equivalence between Dwelling and these other classes, so my search finds things of similar meaning, not just the string of characters making up a word.

    Turning to a document example, I may want to search for all of the documents in my document repository that are legal documents, and even particular types of legal documents. One way I can imagine doing this is to first set my search context to use an ontology describing legal artifacts. Then, using this context, I can ask to see all of the Litigation documents.

    It may sound magical, but in fact is quite mechanical. Within my ontology for legal artifacts a document is a litigation document if it is either explicitly tagged as such (a member of the class; Litigation Document), or matches the criteria that infers with high likelihood that the contents matches that of a litigation document. Although the inference criteria may not be perfect, it can be refined over time. Tuning the criteria allows us to find legal documents of specific type much easier than just doing a brute force search. Also, when we work with a pre-determined set of ontologies, mechanisms can go out ahead of time, and apply the criteria to the documents to pre-classify them according to the inference criteria expressed within those ontologies.

    Inference is most valuable in cases where the authors of documents, or their authoring tools, don’t help very much in establishing the context and meaning of documents. One could supply information for use in semantic search by supplying meaningful metadata along with the contents of the document. In the future, more sophisticated authoring approaches would find authors doing this, but for now, let’s talk more about the discovery of meaning and applying classifications to existing documents.

    I recently submitted a friend’s resume to an online employee referral site. It was able to scan the text uploaded and pull out the educational history of the individual (among other things) and present it for verification with very good accuracy. This sophisticated scan of the document is an example of extracting facts that go beyond simple word matching, and are useful in semantic based searches. It is clear that one could later apply a localized ontology of colleges and universities with this the capability to express something like, “Show me all of the resumes of candidates who graduated from a Preferred School holding a Postgraduate Degree in an Engineering Discipline” — assuming definitions of Postgraduate Degree, Engineering Discipline and Preferred School within that ontology.

    Read more

    Building Effective Relationships in Software

    One of the many factors that led to the success of the Web was in how it defined links in a very simple manner, which encouraged cross-referencing. In the last two installments of this column, I’ve discussed the importance of the identifiers of the Web: URIs. The Web’s native document element, HTML, made it easy to connect one URI to another, and so relationships between documents could grow regardless of who controlled the documents. This of course meant that sometimes such links would break, signaled by the infamous 404 "Not Found" code. Purists of hypertext, the discipline of strongly linked documents, disliked this fragility, but the Web grew, and continues to grow so fantastically, precisely because of the power of simple links needing no strong system of control.

    Read more

    SIOC-ing the Semantic Web

    Jennifer Zaino
    SemanticWeb.com Contributor

    The Digital Enterprise Research Institute (DERI), National University of Ireland, Galway, is the largest semantic web research group in the world, with 100 researchers. Its self-stated mission is to exploit semantics for people, organizations and systems to collaborate and interoperate on a global scale.

    In December, John Breslin, research leader of the social software group at DERI, noted that its tutorial proposal on SIOC (Semantically Interlinked Online Communities), which provides methods for interconnecting discussion methods such as blogs, forums and mailing lists to each other, entitled “Interlinking Online Communities and Enriching Social Software with the Semantic Web” was accepted for the 14th International World Wide Web Conference to be held in Beijing, China in April.

    SemanticWeb.com recently caught up with Breslin to learn more about SIOC, which consists of the SIOC ontology, an open-standard machine readable format for expressing the information contained both explicitly and implicitly in Internet discussion methods, of SIOC metadata producers for a number of popular blogging platforms and content management systems, and of storage and browsing/searching systems for leveraging SIOC data.

    SemanticWeb.com: Tell us a bit about the development of SIOC.

    Breslin:

    SIOC [pronounced 'shock'] started off as an idea in my head three or three-and-a-half years ago. Because I had some experience in online communities (boards, etc.), I saw a need for providing methods to link these sites together. When you look for information on the Web to answer a question, you may get parts of your answer from different community sites. You have to trawl across a lot of these sites before you can get a complete answer. We wanted a method to be able to express the information from these communities in a standard form and then to allow this information to be linked together by adding methods for people to say, for example, that this information was written by the same person who wrote something else, or that it is related to something else on the same topic.

    It started off with the development of the SIOC core ontology, which is used to describe the domain of online communities and what they consist of — users and posts and descriptions of other simple terms that occur in online communities. There is a lot of structure in online communities and inherent connections, in that people tag content, make replies or create trackbacks between posts. This structure that is created in online communities is often hidden in some database behind the scenes, and SIOC is used to expose that structure via semantics.

    First of all we just worked on SIOC internally and got feedback. Then we decided to get more feedback from the community through a W3C member submission process. We gathered partners in this space — a combination of academic and industry partners — and went through a year or so of getting this submission in place, which involved a lot of revisions. The vocabulary kind of evolved by community consensus. That was published in the end of July or beginning of August, and since then it has helped the initiative, as having a member submission makes it more visible and easier for us to get feedback.

    We will also be presenting a tutorial on SIOC at the WWW2008 conference in Beijing. This is the biggest web conference, so having a tutorial at that is obviously brilliant for us. Combined with the W3C submission, we know that there is significant interest in SIOC, but many people don’t know what it is exactly and what it can be used for. We’ll be explaining in our tutorial what SIOC is, how you can use it, and where it is being used already.

    In what ways is SIOC being used today?

    The initial approach was to provide the SIOC ontology and modules producing SIOC data [based on this ontology] for a lot of open source applications, as a lot of community sites are built on open source tools. So we wanted to provide SIOC functionality for these tools that people could then add to their own sites.

    We started to do this with a couple of modules and applications developed at DERI, and then others began to produce SIOC data creators for their own systems. It’s making its way into commercial applications from OpenLink, Talis and Seesmic. For example, OpenLink DataSpaces uses SIOC as a kind of intermediary layer between users making queries to a variety of underlying community systems. So if you have a lot of community applications, their system lets you access the aggregate view of them.

    There are probably, in terms of open source modules and commercial applications, about 40 to 50 different systems using SIOC data at the moment.

    Read more

    Get Down, Get Fuzzy

    Jennifer Zaino
    SemanticWeb.com Contributor

    Google’s got one day more to respond to a patent infringement lawsuit filed by Jarg Corp., the semantic search vendor, and Northeastern University, from whom Jarg licensed the algorithm that the two parties believe Google uses to serve up every query result and earn every dollar of its ad revenue.

    According to Jarg President and co-founder Michael Belanger, “We’re not out there with a stick to beat them up …but if they are using some functionality that is covered in our patents – and we have hundreds of claims in about ten patents – then we want to politely ask them to pay some reasonable royalty. The idea here is to try to get to the end game sooner, with friendly partners, rather than to be antagonists.”

    Whether Google looks at the lawsuit in the same light – or would be apt to consider Jarg a friendly partner – the bigger discussion may be around what that end game is. As Belanger sees it, it’s not the semantic web as envisioned by the W3C.

    “Our point of view is that a rising tide lifts all boats,” he says, with the rising tide being not only Google’s application of the technology that Jarg believes infringes on its patent, but also the growing interest in semantics that many companies are now exploiting, from Radar Networks to search engine Hakia, which recently closed on another $5 million in funding.

    Belanger argues, however, that there’s something missing. In Google’s instance, its use of the algorithm that is the subject of Jarg’s suit delivers more links for users to wade through and guess at the more complex their queries are. In the instance of the W3C semantic web standards and those applications that subscribe to it, Belanger finds fault because they are largely focused on moving information between relational databases in an interactive or interoperable fashion.

    “And – these are some gray hairs from the past – one would say, how is that different from EDI?… Every time you do a project with the standards of the moment it is considered an interoperable model,” he says.

    Away from rigid modeling

    Getting back to that rising boat, there’s an opportunity for people who clearly understand Google and natural language processing and who have begun to understand the W3C’s semantic web standards to also “understand there is a continuum that this next generation requires, and that the fragility of what they are playing with now is not going to be the end of the game,” he says. “To get to the end game, as Tim Berners-Lee and his colleagues put it in the semantic web article in Scientific American in 2001, they have to get all the way down the spectrum to where we are, away from rigid modeling.”

    To clarify, Belanger says that Jarg’s patents bring to the table something in the database world called an unlimited number of pre-computed joins.

    “That means we can have a query more complex by orders of magnitude, and pack more context in there than you can push against a database using the SQL protocol, because there may be three joins of a SQL protocol and the database engine comes to its knees and stops,” he says. “You can’t have a complex query in the SPARQL world, either.”

    Belanger believes that right now, most people are locked into semantic projects in which they build a model and test it with consistency checkers to make sure it behaves in a rigid way, so they can trust the information going back and forth between suppliers and users.

    “That’s basically EDI, but we’re no longer using twisted copper pair wires between two companies, but the Internet. The only difference is the schema is no longer locked up in a database schema at each end,” Belanger says. “It’s now on the table where you can examine it as an ontology, but it is an ontology that defines a rigid model of trusted information going back and forth. It’s only useful in the one narrow case they build it for.”

    And only for as long as the standards prevail.

    “You have to get beyond the W3C standards, beyond the EDI environment, and in some way become extremely fuzzy, and get into sort of a fuzzy AI environment rather than this rigid, fragile modeling stance that they are currently still focused on. We do the fuzzy stuff,” he says.

    Read more

    Open Standards for Data Formats, and Open Data

    Jennifer Zaino
    SemanticWeb.com Contributor

    Two recent announcements point the way to the more open world the semantic web – and the Semantic Web – will require if either is to reach its full potential.

    On the first count, starting last Thursday, any contributions to the microformats wiki (microformats.org) will be placed into the public domain to make reuse of them as easy and as widespread as possible. Those who have already contributed content are being asked to explicitly place those contributions into the public domain by the end of the month, or to remove them.

    Microformats, for those unfamiliar with the term, are positioned as a practical way to add to the semantic richness of the web today. These small bits of code provide a lightweight way of adding simple semantic extensions to web documents, viewable primarily by humans but also understood by machines, to enable the sharing of structured information within web pages. Yahoo! is a big proponent, for example – Yahoo! Tech uses the hReview microformat for all product reviews and Flickr, which it bought in 2005, uses microformats to add location metadata to images.

    What’s the reason behind the microformats wiki development? According to the announcement posted on the site: “By embracing open standards development in the public domain, we hope other standards bodies and communities who choose to call their efforts ‘open’ are encouraged by the example we set here today to do so as well. The importance of open development of standards for data formats cannot be overstated.”

    The notice goes on to say that following posts “will expand on how open standards are essential for open content, data portability, and data longevity.” To explicitly place past contributions into the public domain, members of the microformats community are being asked to edit their user pages to include the Creative Commons Public Domain Declaration template.

    Different approach required

    There must be something in the air around the idea of openness, as this announcement comes just a few weeks after another news release on the theme, as it relates to the decidedly upper-case Semantic Web. Talis, the U.K.-based developers of the semantic web application platform Talis Platform, announced in December that it had released the Open Data Commons Public Domain Dedication and Licence, which it called the first output of a successful partnership with the Science Commons project of Creative Commons.

    The company decided to build in partnership upon the principles of the earlier Talis Community License, in part because certain copyright protections of data and databases that are afforded by the European Parliament don’t apply in jurisdictions such as the U.S.

    According to Talis Technology Evangelist Dr. Paul Miller, who was quoted in the release, “A different approach is therefore required if we are to facilitate the widespread availability of data upon which the emerging Semantic Web will depend.”

    The Open Data Commons Public Domain Dedication and Licence should provide a “workable and easy to use solution for data integration that will take care of the relevant rights over data and databases,” according to the release.

    John Wilbanks, Creative Commons’ Vice President responsible for the Science Commons project, was also quoted in the release. He noted that, “For a commercial organisation such as Talis, with a heritage in the business of creating and managing data, to recognise the importance of the ‘freedom to integrate’ says much about changing attitudes to the ownership and use of data.”

    He also noted that the Open Data Commons Licence approach “furthermore implements the norms of data sharing for scientific data, providing the guidance for scientists to act as good citizens without exposing them to lawsuits and lawyers.”

    Indeed, a recent post on the Open Data Commons blog is entitled “2008 – year of open data.”

    With the release of the Science Commons protocol for implementing open access data and the announcement of the CCZero protocol that enables people to assert that a work has no legal restrictions attached to it, as well as waive any rights associated with a work, and with the Talis-funded Open Data Commons project, “it looks like there will be quite a few options on the table for licensing data in an open way this year,” the blog notes. “This is after a long time where there were no good options for those looking at licensing data.”

    Looks like the doors are about to open wide. Let’s see what the upper- and lower-case (S or s)emantic web can make of it.

    Technology Review – TopBraid Composer’s SparqlMotion

    Taking a break from ontology visualization, we investigate an novel technology out of TopQuadrant that gets users one step closer to realizing the power of the Semantic Web.

    Read more

    NEXT PAGE >>