Archives: July 2008

Is New Search Engine ‘Cuil’ Cool Enough?

Jennifer Zaino Contributor

“Cuil.” You know the word as it is pronounced — cool — as an adjective. But the founders are hoping that someday you’ll know it as a verb, as in “Let me Cuil that.”

Founded by Anna Patterson, the woman who sold her last Internet search engine technology to Google, the new search engine was created in conjunction with former Google engineers, including her husband. The project, whose search index reportedly spans 120 billion web pages, has been in stealth mode until now. Information on the site notes that, “Cuil searches more pages on the Web than anyone else — three times as many as Google and ten times as many as Microsoft.” According to the Associated Press, which interviewed Patterson, it works like this:

“Rather than trying to mimic Google’s method of ranking the quantity and quality of links to Web sites, Patterson says Cuil’s technology drills into the actual content of a page. And Cuil’s results will be presented in a more magazine-like format instead of just a vertical stack of Web links. Cuil’s results are displayed with more photos spread horizontally across the page and include sidebars that can be clicked on to learn more about topics related to the original search request.”

The search engine company doesn’t use the word “semantic web standards” in its own explanation of how the technology works. But clearly it is driven by semantics in the larger sense of the word. According to the web site, “our approach is to focus on the content of a page and then present a set of results that has both depth and breadth. Our aim is to give you a wider range of more detailed results and the opportunity to explore more fully the different ideas behind your search. We think this approach is more useful to you than a simple list. So Cuil searches the Web for pages with your keywords and then we analyze the rest of the text on those pages. This tells us that the same word has several different meanings in different contexts. Are you looking for jaguar the cat, the car or the operating system? We sort out all those different contexts so that you don’t have to waste time rephrasing your query when you get the wrong result.”

Once it has established the context of the pages, the theory is that it’s in a much better position to help users in their search.

How much better? The critics have already been casting stones, noting that the search engine’s servers crashed recently in response to the load being generated by curious searchers, and also bashed it for some inaccurate connections, such as providing “Hispanic American politicians” as a subcategory for a search of “Obama.” So, nobody’s perfect, though we should note that if you explore that category under “Bill Richardson,” you are indeed directed to content that links Obama and Richardson. More interesting was that the sub-category Presidential Primaries under Obama named only Super Tuesday, New Hampshire primary, and primary election, while under John McCain the Iowa Caucus also was included. Both notations still leave out a few states — but in most election years they haven’t counted anyway, right? Search under United States Presidential Primaries as your starting point, and you get only one link and no sub-category options, a far cry from the rich feast of links delivered by Google.

The semantic search engine is backed by $33 million in venture capital. Money well-spent? It may be too early to say, but the Motley Fool seems to think differently. Tim Byers, in a High-Growth Investing blog posting titled Google is Cooler Than Cuil, says this:

“Cuil is an indexer; a Google clone that decided to sift through three times more of the Web’s garbage than others have. What’s to stop Microsoft, Yahoo, or IAC’ from doing the same? Answer: Nothing.”

Startup Peer39 Eyes Semantic-Influenced Advertising

Jennifer Zaino Contributor

Semantic advertising startup Peer39 — which is backed by $12 million in financing from Canaan Partners, JP Morgan and Dawntreader Ventures — has been recognized by MIT Technology Review as one of the web start-ups to watch this year. The company launched its platform in June, with the aim of leading the next generation of semantic advertising based on both context and behavior.

Founder and CEO Amiad Solomon believes the proprietary algorithms behind its solution will take care of some of the unplanned issues that arise when ads are matched only on the basis of random keywords. For example, imagine someone is reading a blog about someone’s trip to Colorado, which might note that the author took some great photos with his camera.

“Contextual technologies will serve up an ad for cameras, but the problem is that’s not relevant,” says Solomon. That’s just a mention within a much broader story. “Then, you don’t relate to the camera ad, and so you don’t click on it, so the performance is not great. We are not just interested in specific keywords but looking at the page as a whole and what the page is talking about, and matching to the right advertisement with the page.”

At the same time, its technology also parses sentiment. If a story, for example, comments negatively on a particular product or type of product or manufacturer of a product, it won’t serve up an ad for that company or offering. It relies completely on its ability to “read” pages in real time and understand the user’s engagement with the topic to deliver precisely targeted ads, and eschews the use of cookies or other technologies that might infringe on privacy. “We hope this is revolutionary,” says Solomon.

At a high level, Solomon says, the algorithms assign different weights to all the text on the page based on how words occur in different sentences and the connections between the sentences.

“We’re not just isolating keywords but the page as a whole and understanding the correlation between keywords on the page and between sentences on the page,” he says. Developing the machine learning technology to scale and to update itself automatically was no mean feat. “IT took us a long time to move from the first to the second vertical, he says. “Now we ping net for millions of web pages and analyze them often so the results get better and better automatically. We insisted on doing this automatically and now there are more than 1,000 categories covering entire Net.”

Read more

Semantic Web, the British Way

Jennifer Zaino Contributor

The way students learn in the 21st century is changing. The way they’re being educated has to, as well.

City University, London, is a partner in a new semantic web effort to help effect this change. The university, in conjunction with Cambridge University Center for Applied Research into Educational Technologies (CARET) and the universities of East Anglia, Essex, and Stirling, is working on a project titled Semantic Technologies for the Enhancement of Case-Based Learning, which is being funded by a 1.5 million pound grant by the U.K.’s Economic and Social Research Council (ESRC) and the Engineering and Physical Sciences Research Council.

“There is basic development in the semantic web in City University to develop tools and ontologies but what we are doing is looking at the semantic web’s potential for knowledge construction,” says Uma Patel, co-director of the project and lead researcher at City University, which specializes in business and the professions.

The project will rely on semantic web technologies to help forge interdisciplinary links in the service of tackling tasks such as problem-based and inquiry-based learning.

“Our vision is that the semantic web and teaching and learning in the 21st century is about more than knowledge transmission, but making sense of vast amounts of information. And it is more than enhanced search but using tools to create new knowledge.”

The collaborative project will explore teaching and learning around three case studies in detail: international journalism, maritime operations, and enterprise business innovation. As an example, City runs a masters’ program in international journalism that is organized with leading experts in the field, who come in and talk about how a story evolved. Teams of students then interrogate the expert on the case that he presented, and then take up elements of the story and look for other information to give them ways to extend the article, producing a newspaper story, radio broadcast, podcast, and so on.

“The semantic web might come in, in that we might want, after the presentation of a case by an expert, to introduce semantic web tools as a way of exploring the connections between stories that exist on the web, so students can discover them [and use them to create new content]. It’s creating new knowledge, constructing new ideas rather than regurgitating information that is fed to them,” Patel says.

Read more

Breaking into the Semantic Web, Part II

This interview with Eric Miller, President of Zepheira was conducted by Golda Velez.

SR: Eric, let me ask your advice. The Semantic Web is interesting, exciting, promising. So say I’m a developer, how do I get involved with it? Or suppose I have a tech company, how do I get work in this field?

Read more

Semantic IT Practices

As noted in a previous post, the current set of methodologies employed in the day to day IT operations of a typical enterprise is poised for perhaps its most significant paradigm shift in several decades. This evolutionary shift is not the introduction of Semantic technology or standards per se, but rather the complete re-visioning of how IT works in the context of Semantic Interoperability. Semantic IT provides us with two crucial capabilities that we simply never had before:

Read more

Semantic Integration and IT

In many ways the practice of information technology has changed little over the past 30 years or so. It may not seem so on first appearance – but the premises upon which our current technologies are still operating are largely based on philosophical constructs that date back 30 years or more.

Read more

Using Semantic Web, Social Networks for Trip Planning

Jennifer Zaino Contributor

The Digital Enterprise Research Institute (DERI), National University of Ireland, Galway, and Tourist Republic are collaborating on the development of an intelligent trip planner that leverages semantic web technologies to reduce the amount of data users have to deal with when planning what generally are fairly major undertakings.

For example, say you want to attend a concert with friends somewhere in Europe — not only do you have to get the tickets, but also book a flight, get a hotel, and deal with budget and date constraints.
How can the semantic web help? For one thing, the interactive trip-planning tool will leverage DERI’s BrowseRDF, which allows users easily navigate arbitrary RDF datasets using an exploration technique called faceted browsing.

“That lets you take an RDF data set and browse through it with faster browsing techniques, says John Breslin, research leader of the social software group at DERI. “Imagine you are looking at destinations around the world and then you choose Europe and then countries in Europe, and then sunny places, and then filter down from there. It uses a combination of auto filtering based on people’s profiles and preferences and lets them tweak what they are looking at to reduce the amount of information they are looking at.”

Key to the project will be converting Tourist Republic’s dataset of various destinations to RDF format, as well as augment that with supplementary information people will want when planning a trip. For example, show a destination as well as what you can do there, what hotels are there, and so on. “We have a data set about places, people and information from Tourist Republic and we can supplement that with information from other public sites. To do that is where the semantic web comes in handy. We can augment the Tourist Republic data set with other public sources of information, such as dbPedia.”

Tourist Republic currently store data on 100,000 destinations worldwide, and has released recently, which creates the platform to allow a wider re-groupment of location data. Jan Blanchard, CEO of Tourist Republic, says the company will also open some of the data generated by our application through an API.

Breslin notes that another aspect he expects will distinguish this trip planning site from others is the collaborative nature of it. “We’re aiming to let people plan trips together and also try to build on the social network, so people, in terms of who they are connected to, can give and get more relevant recommendations on their trips.” Ideally it will be able to leverage sources such as Yahoo’s Fire Eagle, a way for a user to share his location with sites and services online, and the BrightKite location-based social network to add relevancy to recommendations, though he notes that “a lot of these sites are not critical mass just yet.”

The combination of a smart recommendations engine with a powerful booking engine makes the proposition unique, says Blanchard. The company will generate revenue through a commission on bookings, and is looking to partner with a number a companies in the travel sector for this, he says. “Our priority is to partner with companies that gives us access to travel inventories. We will also seek partnerships with travel providers directly, travel social sites and tourist offices,” he notes.

The $200,000 initiative is being funded under Enterprise Ireland’s Innovation Partnership. Nearly all the team is in place now for moving forward, says Breslin. That includes DERI’s Dr. Conor Hayes, who is an expert on recommendation systems, and James Donelan, who has been leading travel technology startups in Silicon Valley, as Tourist Republic’s CTO.

How XML Enables the Semantic Web

Paul Wlodarczyk Contributor

I recently attended the first-ever Linked Data Planet conference, where a number of pioneers in the field of Semantic Web shared their perspectives on the state of the art — and business — of helping the world tag their web pages for meaning. So what is the Semantic Web and how is it different from the web of today? On the web, most search engines today use key words and the number of links to a page to determine the relevance of search results. This is the wisdom of crowds at work: If the key words you are searching for occur often on that page, and the page is popular (i.e. lots of links to it), then it is probably the best bet for what you are searching for.

The downside of this approach is that it infers meaning of the page. On the Semantic Web, the crowds get wiser thanks to the wisdom of authors, who can let the crowds know — in no uncertain terms — what their content means.

For example, when “New York” appears in an HTML document, it could mean New York City, New York State, the Yankees, the Mets, the Giants, the Jets, the song, the strip steak, the state of mind, etc. You get the idea. Words are ambiguous when taken out of context.

If I’m writing about a sporting event, the context of the article lets you know that “New York” means a specific team. The typical search engine, however, doesn’t recognize context. To a search engine, “New York” is just a string that occurs in the document with some frequency.

Key to the Semantic Web is semantic markup, which lets users annotate their web pages with metadata — HTML attributes that don’t get displayed in the document. Semantic metadata describes what the pages are about, letting authors define things with authority and precision.

In my “New York” document, I can state that the document is about the sports team, not the steak. I can do this by tagging the named entities in the document — the people, places, things, events, and facts — in an unambiguous way. I can also set those entities into relationships with each other. If part of my document refers to a player trade between the New York Yankees and Oakland A’s, I can tag the Yankees (entity number one), the A’s (entity number two), and the player trade (an event, but also a relationship between the two named entities).

Overcoming the semantic hurdles

While semantic tagging gives documents unambiguous meaning, it has traditionally faced two large hurdles. First, adding semantic markup has been relatively expensive, in terms of either labor or technology. Second, the market for consuming this markup has been small. Both of those hurdles are rapidly falling away.

Let’s address the second point first. Yahoo! has introduced SearchMonkey, a new technology that rates web pages. Rather than use keywords and number of links to the page (the wisdom of crowds), SearchMonkey finds web pages using the semantic markup that is embedded in the page (the wisdom of authors). This creates a substantial motive for adding semantic markup — search engine optimization. Semantic markup makes your content more likely to be found and more relevant to the searcher.

Read more

Linking Data in the Enterprise

Jennifer Zaino Contributor

Semantic web solutions company Zepheira is embarking upon an initiative to bring the benefits of web architecture to the enterprise. It is working on a community project dubbed Linking Enterprise Data (LED), an effort it began in order to take the basic ideas of linked open data or semantic technologies and try to put it into a form that an enterprise could easily put to use for making data available within the corporation.

The idea has roots in the Semantic Web’s Linking Open Data project, but, says Zepheira partner Uche Ogbuji, “the word ‘open’ still unfortunately scares some people in business.”

Plus, business’ interest is less about converting their cool database of proprietary data to RDF and exposing it to the masses than it is about integrating two legacy applications and perhaps mixing in a new set of data, and making it easy to change information flow depending on departmental needs. And they need to do it securely and in a policy-driven manner so that it’s traceable from an information workflow point of view for auditing and regulatory purposes.

Similarly, the idea builds on the fundamental premise of services-oriented architecture (SOA), but the problem with most of these implementations is that they fail to capture the business context of applications consistently, they don’t really capture how the services are organized to work together, and they don’t really account for the relationship between services at a macro level.

“So basically that means people implement SOA with a pre-web mindset,” says Ogbuji; there may be better descriptions of and interfaces to applications, but they are still disconnected and lack a global view of how they relate to each other, the information in them, and the people or assets involved. In contrast, the web world enables more of “an organic discovery process” to get the value you want out of an enterprise information space. LED is an attempt to recreate that in the enterprise. “Let’s align business value and business context with IT but do it using a web pattern that proved so successful at creating a unified information space that people could get what they need instead of being experts at individual destination applications,” says Ogbuji.

Ogbuji acknowledges the idea of opening up enterprise data can seem a bit frightening to IT staff that have spent their lives trying to protect data, and that’s why the LED effort has to stand a bit apart from the semantic web in general or the Linking Open Data movement. Indeed, there are rational reasons for concern, as you do need to monitor information flow in an enterprise for fiduciary or regulatory reasons, and there’s no doubt a lot of the data in a corporation’s four walls is sensitive. “Yes, you have to be careful about plugging in a credit card database to a global network; not every employee should have access to that,” he says. “This can’t be a wild west of information through efforts of improving the overall information space.”

Read more

SEO and the Semantic Web

John Clarke Mills Contributor

With the proliferation of the Semantic Web, all of our data will be structured and organized so perfectly that search engines will know exactly what we are looking for, all of the time. Even the newest of newbies will be able to create the most well-structured site that would take tens of thousands of dollars today. Everyone’s information will be so precise and semantically correct there will be no need for Search Engine Optimization anymore!

The fact of the matter is, this is never going to happen. Being a long-time SEO practitioner myself, I am very interested in the ramifications of the Semantic Web on today’s search, especially because I am tasked with optimizing Twine when it first becomes publicly readable this summer.

Before we dive too deep, let’s first look at what SEO experts and professionals do today. In a nutshell, we research, study, and test hypotheses learned by watching the heuristics of a search algorithm. We implement by writing clean and semantically correct HTML in certain combinations in order to allow robots to easier asses the meaning of a page. We use CSS to abstract the presentation layer, we follow good linking structures, add proper metadata, and write concise paragraphs. We organize our information in a meaningful way to show bots clean parse-able HTML. In some sense we are information architects, in another we are marketers.

But what would happen if a search engine company published their algorithm? Although that probably isn’t going to happen anytime soon, what if they would tell us exactly what they were looking for? That’s what the Semantic Web is going to do to search. Just the other day Yahoo announced SearchMonkey for just this purpose. It is only going to get bigger. Being told how to mark up your information certainly takes a lot of the guesswork out of it. But in terms of the role of the SEO expert or professional, I don’t think we can retire just yet.

The Semantic Web is organized by people just like the Web of today. The only difference is that now we are going to organize around better standards. Just as people have a hard time organizing their closets, attics, and garages, people have a hard time organizing their websites. Although the Semantic Web will add structure to the Internet, make it easier for novice users to create structured content, and change the way we search, there is still a need for experienced help.

Enter SEO. Some of our roles may have changed, but for the near future there will be still be a lot of similarities. The need to study and analyze robot behaviors to better tune information isn’t going away. They will still have to be on top of the emerging trends, search technologies, and organic ways to drive traffic. The fact of the matter is, nothing is going to change drastically for a while. In the near term, I am mostly worried about how to integrate Twine into the Web of today.

Not very semantic, huh? Well, that’s not say we aren’t going to integrate with microformats, display RDF in our pages, and publish our ontology. All of this is extremely important as the Semantic Web emerges; however, in a world where search is run by Google we have to cater to them. There are a growing number of semantic search engines and document indices out there, which are definitely raising awareness to the mainstream. Yahoo just jumped on the semantic bandwagon publicly and you know Google can’t be too far behind.

In conclusion, there’s nothing to worry about anytime soon. The SEO expert’s salary isn’t going back into the company budget. We still have to tune our pages to the beat of Google’s drum for the time being. When things do take a drastic turn, we will adapt and overcome as we always have. That’s what good SEO does. As for me, I will tune Twine just as I used to tune pages over at CNET, following the teachings of Sir Matthew Cutts et al.

This article first appeared on