SemTechBiz SF SemTechBiz UK SemTechBiz NYC more TVNewser TVSpy GalleyCat AppNewser UnBeige AgencySpy PRNewser 10,000 Words FishbowlNY FishbowlLA FishbowlDC MediaJobsDaily SocialTimes AllFacebook AllTwitter

Main

Report: Semantic Web Plays Key Role in Net’s Future

Jennifer Zaino
SemanticWeb.com Contributor

What is the future of the Internet? It’s a good question to ask as we head into the last year of the first decade of the 21st century — and it’s a question that the Pew Internet & American Life Project in fact asked Internet leaders, activists and analysts.

In its just-issued report, “The Future of the Internet III,” the survey delivers the perspective of these leading thinkers as it relates to how the Internet will have evolved by 2020.

Semanticweb.com readers won’t be surprised to hear that the semantic web is destined to play an important role in the Internet of tomorrow — but as has been discussed here before, we are talking about an evolution, not a revolution. Most of those surveyed by the Pew Internet & American Life Project envision that the original Internet architecture will still be in place in 2020 rather than replaced by a new “next-generation” system, but continually refined.

“Those who wrote extended elaborations to their answers projected the expectation that IPv6 (define) and the Semantic Web will be vital elements in the continuing development of the Internet over the next decade,” the report notes.

That harkens to the thoughts of Nova Spivack, CEO and founder of Radar Networks, who summed up the semantic web last year thusly: “I think that the semantic web is an evolution more than a revolution. At first it won’t be as radical a change as some people have hyped it. It will be an iterative, incremental, gradual improvement of all the information tools we use, and that will over time reach a tipping point. But that’s more than ten years away.”

The report also postulates that the Internet in 2020 will be a place of even greater transparency. It would be difficult to think that the Semantic Web — the web of linked data — isn’t going to have a major impact on information transparency, to whatever ends the transparency of people and organizations is put (the report concludes such transparency will not necessarily yield more personal integrity, social tolerance, or forgiveness).

Read more

SemantiNet Adds Support for More Yahoo Apps

Jennifer Zaino
SemanticWeb.com Contributor

Startup SemantiNet has extended its relationship with Yahoo since launching its Firefox browser plug-in that helps people discover content they are not actively looking for. That technology, dubbed headup since we reported it here (“SemantiNet Hits the Internet Stage“), is now enabled with Yahoo! Fire Eagle geolocation and BOSS (Build your Own Search
Service) support, adding to its support for Flickr, Upcoming and Delicious.

Of leveraging the FireEagle API, founder Tai Keinan says the geolocation abilities open up new avenues of interest for users.

“For example, if you are using Fire Eagle and you go to Flickr, [with headup technology] you can see a picture of a nearby place, because it knows your location,” and those images were geotagged with the same location, he says. “Or if you look at a band you can see which events are closest to where you are at. It’s a novel way of leveraging location-based services.”

Leveraging the BOSS open search web services platform API is more of a backend play that aids in SemantiNet’s own ability to produce relevant search results and analysis of content. BOSS gives start-ups like Semantinet access to Yahoo! crawling and indexing, ranking and relevancy algorithms, and infrastructure to build next-generation search solutions.

According to Keinan, since SemantiNet began utilizing BOSS, the quality of search results has improved significantly. “BOSS is a behind-the-scenes type of instrument,” he says. “It helps us understand key terms and things in different articles to improve our search.” Keinan notes that headup is now able to better distinguish between objects with the same name – Las Vegas, the city, and Las Vegas, the TV show, as an example. “This is considered a really difficult challenge and Yahoo BOSS has really simplified it,” he says.

Keinan also notes that headup is now taking advantage of Yahoo Finance and Yahoo Contacts. How might this work for users of SemantiNet’s service? A Yahoo Contacts user might, for instance, import his contact list to gain more information about leads or friends. Or you might, for instance, see a friend on Facebook and note she works for a particular public company, and from there discover how that company’s stock is doing or other relevant financial data and news articles about the firm (in case it’s time to alert your friend to start looking for a new job!).

“The idea of associated browsing is what we are e trying to promote,” he says. “So you are using headup to jump from one object or thing to another, from a person to a company they work for to their product or a similar product. When it’s working as it should, you create a seamless browsing experience where you are always focused on the thing or object that interests you.”

Towards that end, Keinan says that Yahoo is doing a tremendous job in terms of opening up its data sources to out-of-site consumption. “This is in line with the way we view the Web — make it easy to connect users to information without forcing them to go to specific sites.”

Freebase Plows Ahead

Jennifer Zaino
SemanticWeb.com Contributor

Last week saw the debut of Freebase’s Acre integrated application development and hosting environment. But it’s something more, too. Consider it the next step in developer Metaweb’s mission to build up its data and community (see Freebase Reaches Out.). It’s an investment that the company hopes will play a key role in building a community of applications off Freebase data and generating data contributions from their own daily information flows.

“We had developers who were coming and using Freebase as a platform previously, but what we observed were a number of different barriers [some] people were hitting,” says Mike Osborn, Metaweb VP of marketing. Often, they didn’t know where to start. They saw an incredibly data-rich environment. and had a vision for using that data to power some interesting applications, but to get up the Freebase learning curve is, he acknowledges, “non-trivial.”

Acre is one answer to Metaweb’s plans to bolster its efforts to have people read and write interesting data from Freebase. Among its features is the fact that all code is viewable, and it’s easy to clone or import code from other developers’ applications, as a way to help people collaborate. “The fact that we are hosting it and it’s a server side Java script– we believe these are fundamentally important to make it as easy as possible for people to start,” says Osborn.

Osborn points as an early example of Freebase’s collaborative zeitgeist a member’s creation of a Vancouver database, joined by a number of his friends, and the subsequent creation of a set of tools using Google Friend Connect to create a Vancouver Freebase social network to manage projects on the database. “As they load information relating to schools or podcasts [or other things] in Vancouver they have a social network that ties into Freebase and lets them mange their data projects,” he says, and related to that they’re using Google’s custom search application as a mashup to get remarkably clean search results focused on Vancouver.

The concept of writing data to Freebase through the medium of social networks also can be leveraged by larger partners.

“One of the most important constituencies in the Freebase ecosystem are the consumers who simply want to consume or contribute data in onesies and twosies, and we believed all along that the power of Freebase is going to be best enabled when it’s in context,” Osborn says. Say, for instance, a consumer wants to know all the names of all left-handed quarterbacks in NFL history — a consumer may discover that information harnessed from Freebase on a social network they frequent, and maybe they even will want to update it with a forgotten quarterback or two. “So there’s interest in small and large partners in having a tightly reconciled data application piece from Freebase exposed in their particular product flow, and those types of applications will be very compelling for the developers building them and those contributing back to Freebase through their own normal flows and ordinary consumption.”

Over the next year, as Metaweb sees people innovating and creating interesting applications, it plans to foster a more actively managed community that helps people connect and realize there is common work occurring.

“We need to make that a more explicit outbound approach, so people can learn from each other and shorten development cycles,” he says. And it will be looking overall at hardening the infrastructure as well. Freebase doesn’t have the usage to tax it in any meaningful way yet, Osborn says, but that’s something it expects will change as Acre makes its way into users’ consciousness.

ZoomInfo Zooms Marketers to Prospects

Jennifer Zaino
SemanticWeb.com Contributor

ZoomInfo, the source of business information on people and companies, last week launched ZoomInfo Lists. The company calls it a powerful direct marketing tool that provides email, phone, and print direct marketers with the ability to create targeted marketing campaigns from a CAN-SPAM compliant database, updated daily, of millions of heavily indexed people and companies.

Among the features of the new service are the ability for marketers to email targets as many times as they want over the course of a year; the option to focus campaigns to specific audiences based on its 24 categories of information; and in-depth profiles of prospects including their career history, education, and memberships on boards or trade organizations.

What’s behind the new service is ZoomInfo’s semantic search engine and artificial intelligence and natural language algorithms, which CTO William Wechtenhiser says are the force multipliers in helping marketers target their campaigns to those who are likely to be most interested in their offerings, most likely to respond to it — and indeed most likely to want to be found by the marketer.

“We’ve got 50 million people who are very heavily indexed, associated with companies, and we have a lot of information on those companies themselves. So to slice and dice this on semantic principles that are interesting to your business is pretty cool,” he says. Even when this results in smaller lists of prospects, the value is they are the prospects they actually want to contact.

At a very high level, ZoomInfo takes unstructured or semi-structured content off the web and alters it into structured data that can be semantically searched, Wechtenhiser explains. There’s a lot of sloppy stuff on the web, so the challenge is keeping its data complete and accurate. Semantic and natural language technologies such as sentence-based extraction and information unification enables ZoomInfo to make sense out of two different profiles of a person named Tom Smith, for example, so that it can conclude whether they might be the same person.

“There is lots of data on the web that contradicts each other, either because something is old or false or it was a typo,” or for other reasons, Smith says. “At the end of the day we get our best guess of who this person is, or who the company is,” so that marketers — or other searchers — are able to get results that correctly correlate that information based on the criteria they set. ZoomInfo looks at hundreds of millions of web pages and gets tens of billions of facts from them. For example, it can see a sentence in a press release that says something like John Smith left Company A and joined Company B, and has a new title, and use that information to update its records so that the existing John Smith’s data is updated rather than duplicated.

Read more

The Social Semantic Desktop Project Wraps Up

Jennifer Zaino
SemanticWeb.com Contributor

This month marks the conclusion of Nepomuk, the Social Semantic Desktop Project. The three-year project, which was focused on personal information management and sharing desktop data, wraps up having met its goals, which included:

  • Building the architecture, defining the ontologies (accessible at http://www.semanticdesktop.org/ontologies), and bringing the concept to fruition, as well as enabling integration of the technology with a number of existing desktop applications;

  • The APERTURE development framework for getting data and metadata from many common file formats, and creating RDF data;

  • A KDE Linux desktop implementation of Nepomuk’s core concepts; and

  • Four case studies in areas including bioscience, enterprise, and the Linux community to show how the technology can help information management in particular scenarios. For example, a case study was undertaken with the research department of software vendor SAP that revolved around using the technology to support work processes.

  • Ansgar Bernardi, deputy head of the Knowledge Management Department at Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI, or the German Research Center for Artificial Intelligence) and Nepomuk’s coordinator, explains the problems Nepomuk aimed to solve. The information people have on their personal computers is stored in a variety of ways, in different file types, as part of different applications, in email folders and browser bookmarks, and so on. To make sense out of that and bring the data you need together is hard.

    “We want to give you the possibility to explicitly describe such relations, to interlink between information across different applications and different file formats, first of all to represent your information and to allow for automated services to help you in your information management,” Bernardi says.

    To this end of building the personal semantic web, the project employed existing Semantic Web standards as far as possible, starting with RDF as the data repository and database technology format, continuing with the idea of ontologies for representing the concepts in which users want to express themselves, and then employing communications protocols to allow interconnections between services.

    “So you get the possibility to connect and interlink information on your computer regardless of application, file format and data structure,” he says. But as they say, no man is an island. So to share such information, as well as the metadata created on these personal semantic desktop, requires the social semantic desktop, where peer to peer connections enabled by distributed storage and indexing lets users find information across different workplaces and personal computers, to exchange data and metadata as they see fit.

    “You have access only to things that have been explicitly shared,” Bernardi says. “So, if you install Nepomuk, you have the possiblity to say that for a particular file for a particular concept, share this, and you can even specify with whom to share and no one else will have access.” This works through the use of a public key encryption system.

    As of the middle of November there were more than 10,000 downloads for the Nepomuk tools. The Nepomuk project website, a wiki that contains pointers to numerous information such as public deliverables and publications, is http://nepomuk.semanticdesktop.org, and the prototype for download, technical documentation, source code, and a bug tracker facility is available in the NEPOMUK developer website at http://dev.nepomuk.semanticdesktop.org. Community-specific activities maintain websites of their own; all of them are liked to from the NEPOMUK project website. The KDE developments, for example, can be found at http://nepomuk.kde.org. There is also a Nepomuk-Mozilla and Nepomuk-Eclipse implementation of the project underway.

    Life after the Nepomuk project also includes expectations of sustained development within the KDE environment. Bernardi also notes that some of the project partners already have dedicated resources to accompany development beyond the duration of the project in the KDE area, including DERI (Digital Enterprise Research Institute) at the National University of Ireland, Galway.

    Number two on the list is the creation of a dedicated spin-off company which will sell a new PIM tool product and consulting services based on this work; the company is currently being funded, and Bernardi says DFKI has an excellent track record on this front, having spun off more than 50 companies in the past.
    The third activity, he says, is the creation and long term maintenance of a kind of legal body to serve as the communications axis and organizer for meetings and other events associated with the project, targeting industrial customers who want to know about the possibilities of the technology based on the Nepomuk project’s experiences.

    Get Your MediaWiki Hosting Here

    Jennifer Zaino
    SemanticWeb.com Contributor

    As companies and others get onboard the Semantic MediaWiki bandwagon, the number of start-ups offering to host these wikis is on the rise. Major wiki hosting provider Wikia in March began offering Semantic MediaWiki to all of its wiki sites upon request back in March, for example, and in July Referata also began offering hosting for SMW-based semantic wikis that also offers the usage of Semantic Forms, Semantic Drilldown, Semantic Calendar, Semantic Google Maps, and some of the other related extensions such as Widgets and Header Tabs.

    What is Semantic MediaWiki? It is GPL-licensed software that is an extension to MediaWiki — the software that runs Wikipedia – which allows for the encoding of semantic data within wiki pages; it provides a basis for managing large amounts of data in MediaWiki, supporting wiki-based data creation, semantic search, and data export.

    Yaron Koren, a founder of Discourse DB — billed as the user-powered database of political commentary and a freelance SMW consultant — is the creator of Referata, which manages the system administration work around its customers’ sites.

    “Businesses really are starting to use wikis on a large scale,” says Koren, primarily for managing organizations and their people, locations, projects and document. He’s hopeful that the momentum may shift to companies leapfrogging over traditional Wikis in favor of the semantic kind. “I actually think they are easier to use than regular wikis. The nice thing about the MediaWiki suite extensions is that they provide things like forms, so that in addition to providing meaning to data they have to provide structure to data.” So, if a company wants to add information about new employees, for example, they can just do that via forms rather than figuring out wiki markup and what data they have to put in. “The hope is that the whole semantic Wiki concept in practice is a lot less esoteric than it sounds on paper,” Koren says.

    In his experience, he’s heard from parties who see SMW as an alternative even to offerings such as Microsoft Sharepoint for managing data collaboratively.

    Read more

    Millions of Hits Force Europeana Portal to Reboot

    Jennifer Zaino
    SemanticWeb.com Contributor

    As Semanticweb.com has been writing about, European governments and institutions are heavily investing in the development of the Semantic Web.

    One effort that is built using semantic web standards is the Europe portal Europeana.eu, a prototype site billed as a European digital library that will give users direct, multi-lingual access initially to some 2 million digital objects, from film to photos to paintings to manuscripts to archival papers. The site launched November 20, but now it’s a victim of its own popularity: Ten million hits an hour crashed it, and now it’s not due to go live again until mid-December in what is said will be a more robust version.

    The Europeana project is expected by 2010 to give users access to more than 6 million digital items, and is ultimately expected to include a business model to ensure the site’s sustainability. According to the web site, “Europeana is a Thematic Network funded by the European Commission under the eContentplus program, as part of the i2010 policy. Originally known as the European digital library network — EDLnet — it is a partnership of 90 representatives of heritage and knowledge organizations and IT experts from throughout Europe. They contribute to the Work Packages that are solving the technical and usability issues and developing the specifications for the prototype.”

    The project is run by a core team based in the national library of the Netherlands, and builds on the project management and technical expertise developed by The European Library, the site says. The European Library is a portal that enables people to search across 150 million titles, from 172 collections in 31 European national libraries, and is a service of the Conference of European National Libraries.

    Structured metadata is key to contributed content for the portal, which will support RDF triples. Europeana will use the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) harvesting approach. All content aggregators and contributors are required to provide metadata about their resources in unqualified Dublin Core which will be used to build a basic index for simple search, according to the Technical Requirements for providing content. Content aggregators and providers are strongly encouraged to provide more elaborate metadata to enable users to get straight to their content, and so that the portal can build the sophisticated services that users expect, and all data transfer will be based on XML structured files, according to those requirements. Content themes for the prototype include cities, social life, music, crime and punishment, and travel and tourism.

    The problems leading to the temporary closure of the site appear to be related to an unexpected level of interest (it had expected up to 5 million hits per hour and reached 13 million hits at its peak), and a lack of computing capacity to support the high traffic. Three servers were online to support the system; these servers deliver contextual information about the digital items, including a small picture. Once users find what they want by searching this contextual information, they click to get to the full content that is stored on the servers of the respective content contributing institutions.

    After the site went down for the first time, the Europeana management in The Hague increased computer capacity to deal with 8 million hits per hour,” according to a press release issued on the event. But that wasn’t enough to handle the load, and so “a serious upgrade of computer capacity will be carried out in the coming days and then tested in order to cope with the massive interest from the public.”

    Please enter your content here.

    The Semantic Web, Deep in the Heart of Texas

    Jennifer Zaino
    SemanticWeb.com Contributor

    With a three-year, $550,000 grant from the National Science Foundation, UT Dallas researchers will be delving into issues around the scalability of the semantic web, entity resolutions, and policy specification and reasoning.

    The project, whose funding was announced in October, joins other semantic web research efforts taking place at UT Dallas funded by agencies such as the Intelligence Advanced Research Projects Activity, as well as collaborative efforts taking place with Raytheon Co. on creating semantic web technology that can find and analyze visual information.

    The principal investigator for the project is Dr. Bhavani Thuraisingham, a professor of computer science in the Erik Jonsson School of Engineering and Computer Science at UT Dallas. UT Dallas has been working with HP Labs in developing the JENA RDF engine. “but one problem is managing large, large graphs,” explains Thuraisingham.

    “It’s very well to talk about the semantic web doing this, that, and the other thing,” but representing all this information requires very large graphs, he said. That’s where UT Dallas’ expertise in data mining, one area of specialization for Thuraisingham’s colleague Latifur Khan, will come in handy.

    The entity resolution problem is one that is capturing the attention of many researchers. When the same word has different interpretations, ontologies are required to sort out what the particular reference is to. She uses her own name as an example of potential confusion: Bhavani is the name of a ferocious goddess in India, a river, and a city or town, as well as her own name. That ambiguity needs to be resolved. While she can’t provide details at this point, she notes that UT Dallas is looking at some clustering techniques as a unique approach to dealing with these issues.

    One of the reasons the government is interested in the semantic web is related to the issues of data sharing, springing from the 9/11 tragedy where information that might have helped target the terrorists in advance of the attacks had not been shared among different agencies. The semantic web opens the door to making the data that users want to get more available, useful and relevant, but what should be shared and policies around how that data should be shared will be an issue, whether it’s within government agencies or among parties such as health care practitioners, insurance providers, and patients, or even within various social networks.

    Read more

    The Next Generation of Video

    Jennifer Zaino
    SemanticWeb.com Contributor

    As video becomes a bigger and bigger part of the content we consume – call it Video 3.0 – the challenge grows for publishers to tag that content and make it navigable, searchable, and monetizable. Semanticweb.com recently caught up with Alex Castro, CEO of Delve Networks, the semantically enabled and speech recognition savvy video search platform, who was fresh from the Digital Hollywood conference. Here’s what he says some of the buzz is about, both at Digital Hollywood and throughout the industry, as it relates to next-generation online video content. On PC to TV video: At Digital Hollywood, Castro said there were a lot of discussions about how to get Internet video to consumers’ TV sets.

    “That’s been something that’s been kicked around but there’s a bit more tangible traction around people solving that problem,” he says. He pointed, as an example, to ClearLeap, which works with video content owners and cable, satellite and IPTV companies to offer a new model of video delivery on the Internet. “Essentially they tie Internet content companies into their system that works with set-top boxes,” says Castro.

    There’s potential for Delve Networks to put its semantic video platform to work with ClearLeap. “Our idea with them is that the publishers using our system can get video piped to ClearLeap, to make sure it’s in the right format so it looks good on TV. This is part of the recognition that video is becoming ubiquitous.” Traction around solutions like ClearLeap’s is good forDelve, Castro says, because it creates demand for publishers to want a video publishing platform.

    On High-Definition video: Castro says there’s been some surprise in the industry by consumer appetite for higher quality video on the Internet, which is leading to requirements to deliver high definition video online. The issue is that many of the consumers who want this don’t have the high-speed connections required to deliver it seamlessly. That doesn’t have an impact on the semantic capabilities Delve can bring to the video searching picture, but it does create challenges in terms of vendors being smarter about who they deliver video to, and how they do it.

    “If there’s a particular user with a terrible connection, and you are trying to give them high-definition video, that ends up in a terrible user experience, where you have rebuffering and jerky video and it’s irritating,” says Castro. “So, with a system like ours, we need to make multiple copies of a video at different bit rates and then try and determine, based on the quality of bandwidth for end users, which of those versions to serve, and maybe even adjust which bit rate you are using throughout as the video plays.”

    Read more

    Inform Helps Media Giants Monetize the Semantic Web

    Jennifer Zaino
    SemanticWeb.com Contributor

    Last month, The Washington Times became the latest media outlet to start using the services of four-year-old Inform Technologies. The newspaper said it was using Inform’s semantic web product to create topic-specific pages about significant news and newsmakers, power its Dig Deeper feature that helps readers find related themes and related stories, and link its video and multimedia content to articles throughout the site.

    Inform Technologies also counts among its customers the Washington Post, Sports Illustrated, the NY Daily News, CNN, and a number of other publishing sites.

    To CEO James Satloff, a list of such clients paying to use its technology makes Inform the standard in the industry.

    “By having so many different authoritative media use your technology for disambiguation and categorization and textual relevance, that’s how you get to be the standard,” he says.

    What Inform does for its clients, from established media sources to start-up bloggers, are four main things, Satloff says. The technology saves them money by providing consistent industry standard tagging of content; compels users to spend longer on a publisher’s site; attracts new unique visitors; and helps them monetize their digital assets.

    Journalists spend between 12 and 17 minutes per story hand-tagging content, Satloff says — and they hate it.

    “If you spend just 15 minutes doing this per story and you pub 100 stories a day that’s 500 man-days a year of just tagging,” he says. With Inform, writers or editors can submit their text and in 200ms get back tagged articles based on what Satloff says is a “phenomenally deep and rich ontology.”

    And Inform offers this part of its services for free, in the interests that better and more consistent tagging across the web is good for the publishers, the industry, and of course, for Inform, too.

    By surfacing valuable contextual links and extracting related topics based on its real-time “reading” of the article, Inform makes it easier for readers to dive further into a publication’s digital assets, keeping them on the site longer by pushing them deeper into areas they didn’t know they wanted to go. Take, for example, an article on Bristol Palin, the daughter of vice presidential candidate Gov. Sarah Palin. In addition to creating automatic links within the text, it extracts related contextually related topics that might be of interest to people reading the article, even if those words don’t actually appear in the text — stories on childbirth, for instance.

    Inform can also automatically create topic pages that can raise a site’s profile on the search engines, and with new visitors, by bolstering its credentials as an authority on a particular area. For instance, a story about the current crisis in the financials market ties nicely contextually to a topic page of articles the publication has done on Treasury Secretary Henry Paulson.

    “Like beachfront property, those topics pages are valuable from a user engagement perspective,” says Satloff — visitors go to those pages because they are specifically interested in the area, not because they got there by accident.. “And that’s brand new real estate that ads can be sold on. Topic pages create a tremendous increase in the volume of pages that exist for the publisher — some 20 or 30%. And since we know with such precision all the topics that story is about, it’s an easy way to pass contextual hints to ad partners for better monetization.”

    << PREVIOUS PAGENEXT PAGE >>