Posts Tagged ‘search’

Big Data For Lean Startups, Or, A Poor Man’s Watson

What do big companies have that most emerging businesses don’t have to help them get value from Big Data? Well, to start with, there’s lots of money and a ton of technology resources.

Never fear. At the upcoming Semantic Tech & Business conference in Berlin, Christopher Testa, CTO of startup WhiteBox Inc., plans to give companies with considerably fewer resources than giants like Google and IBM insight into how to use Big Data as a small, lean startup. His guidance will draw from his own past experiences at Google training AdSense; lessons learned studying the development of IBM’s Watson; and his current efforts to apply Big Data principles to create an expert system for amateur radio operator license exams at his own startup, with limited engineering resources. Most recently Testa was head of engineering at Ad.ly, and that will factor into advice about how to run a data center with free and open source solutions, too.

Read more

Semantic Tech & Business Conference Returns to San Francisco

Semantic Tech & Business Conference returns to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. Sign up now!

Lessons Learned On the Road To Linked Data

What’s the path from an XML based e-government metadata application to a linked data version? At the upcoming Semantic Tech & Business Conference in Berlin, the road taken by the Dutch government will be described by Paul Hermans, lead architect of Belgian project Erfgoedplus.be, which uses RDF/XML, OWL and SKOS to describe relationships to heritage types, concepts, objects, people, place and time.

Some 1,000 individual organizations compose the Dutch government, each with their own websites. An effort to employ a search engine a few years ago to spider those different and separate web sites to have one single point of access didn’t work as anticipated. The next step to bring some order was to assign all the documents published on those sites a common kernel of metadata fields, which led to building an XML application to enable a structured approach. Linked Data entered the picture about a year and a half ago.

Read more

The Power Is In The Link

Courtesy: Flickr/ RambergMediaImages

Attendees at the fast-approaching Semantic Tech & Business Conference in Berlin will find one of the opening conference sessions, The Simple Power of the Link, to provide a good introduction to the value proposition of Linked Data.

Presenter Richard J. Wallis is happy to be on the docket early, so that those in the audience who aren’t coming from a died-in-the-wool semantic web background will get a sense of the big-picture benefits to be realized, and incented enough to explore the possibilities that they won’t be scared off by the more technical discussions later in the program. “Later on, when presenters start talking about graph models and SPARQL endpoint performance, hopefully they can harken back to the simple basic benefits I’ll be discussing,” says Wallis, who will be conducting the session as an independent associate on behalf of Kasabi, the Linked Data marketplace from Talis Systems Ltd. Wallis, currently Kasabi technology evangelist, is launching his own semantic web consultancy this month.

Read more

The Evolution of Search At Google

Google this week posted a video about its evolution of search that’s worth the watch. All six minutes of it.

The highlight for semantic web aficionados will be at the end – or should I say, the future. As Google Fellow Amit Singhal explains, his dream has always been to build the Star Trek computer. That’s where a farmer in India can walk up to it and ask when’s the best time to sow seeds this year because the monsoons came early. “Our users need much more complex answers,” he says. Answering complex questions like that one “are all genuine information needs, genuine questions that if we google can answer our users would become much more knowledge and they will become more satisfied in their quest for knowledge.”

Read more

Antidot’s Open Source db2triples Implements R2RML and Direct Mapping

Antidot, which makes the semantically-powered Information Factory and Antidot Finder Suite software, this month released its db2triples as open source component software, available here, which implements the W3C RDB2RDF Working Group’s proposed R2RML language and Direct Mapping, covered here.

Antidot, in fact, shared with the W3C its experience leveraging Direct Mapping and R2RML to, in just half a day, fetch information from hundreds of tables in a client’s Magento ecommerce database to transform it to a graph model. That’s normally a complex task, says Antidot founder and CEO Fabrice Lacroix, which would involve data transformation and database content indexing of an unknown database model. “No one [here at Antidot] knows the complex, dynamic data model from Magento, and it’s very difficult to reverse-engineer these sort of models,” he says.

“So with Direct Mapping and R2RML it is very easy to go directly from the database to the graph you need…and then extract just the business objects we need. We did it in just half a day. Imagine that. For such complex stuff that’s a very short timeframe.” Lacroix says that the company thought it only fair, after that success, to send something back to the community.

Read more

Latent Semantic Analysis Helps Assess Health Concerns of Military Personnel

Photo courtesy: Flickr/ The National Guard

Military personnel are likely familiar with The Millennium Cohort study, which began in the late 1990s to evaluate the effect of service on long-term health. In addition to the service that thousands of men and women in uniform already have given their country, many of those who participated in the 2001-2003 and 2004-2006 survey cycles also may contribute to advancing the understanding of qualitative survey results that may further epidemiological research.

Researchers have released the results of their application of latent semantic analysis to an open-ended question found on The Millennium Cohort study. The question asked respondents to discuss their additional health concerns, in as much detail as they like about any health subject that was not otherwise covered. In October the researchers published the report, Application of Latent Semantic Analysis for Open-Ended Responses in a Large, Epidemiologic Study, which found significantly lower self-reported general health among the group of almost 28,000 Millennium Cohort respondents who answered the open-ended question, compared to the nearly 80,000 participants who did not.

Read more

Contextual Analysis Tool Could Have Helped Pinpoint U.K. Riot Locations

About this time last year The Semantic Web Blog introduced readers to a U.K.-based startup called Blueflow Ltd. and its BrandAura software. The social media contextual analytics technology and services were aimed at helping marketing and branding pros understand commentary about them taking place across the web, by determining the words used in context with a product or brand and assessing comparisons with competitive offerings.

Well, the tool found a new use this month – with the unfortunate rioting that took place in the U.K. “Our analysis of social media was able to predict which locations were being targeted by the rioters before they attacked those locations,” reports Dr. Andrew Starkey, co-founder and technology director, via email. The BBC news reported a story this week that said that the police were using Twitter and the Blackberry Messenger Network to monitor streams and comments with the goal of picking up intelligence on locations rioters were possibly targeting – the Olympic site, Oxford Street and Westfield shopping centers among them. But that intelligence, its acting Commissioner Tim Goodwin admitted, could also be misleading.

Read more

Whatever Happened To … And What’s That About a Semantic UI-Inspired Tablet?

Will some semantic web mysteries soon be solved? There’s been some chatter in the last few weeks about how some semantic technologies acquired by some giants in the IT space will at last see the light of day in their respective platforms. At the same time, the publicity engine has been hard at work for an upcoming tablet that also promises some semantic goodness.

Let’s start with Google, which in the spring closed on its acquisition of ITA Software, a move that was expected to help it get semantics into travel booking thanks to ITA’s Matrix airfare search engine (see our story here). Earlier this month the rumors started circulating with TechCrunch that Google would launch an ITA integrated-flight search product within a few weeks that could include features such as map-based search and information on flights, times and prices based on general search terms, such as ski trip, and user IP addresses that could bring up options from Colorado or Utah, for instance. It’s a few weeks later, and we’re still waiting.

Read more

In Blogging Space, Spoils Go To Early Posters

The breaking news is that a new service from Regator has resulted from tweaking its semantic algorithms to find within its human-curated collection of web content emerging stories and to quickly alert bloggers and journalists about them via a desktop app.

Regator actually has been around as a curated blog directory and search engine for a couple of years, and in and of itself is a perfectly good and pretty fast source for the word on the digital street, along with the other usual suspects (Twitter, Facebook, Google blogs, CNN, etc.) – at least in so far as the big stories go. “But unless it’s a really big story and Twitter explodes eventually, you won’t find those second-tier news stories so easily,” says Scott Lockhart, Regator cofounder and CEO.

Read more

Google, Yahoo! and Bing Announce Schema.org

[Revised and re-posted at 4:03pm EST]

schema.orgIn a collaborative effort reminiscent of sitemaps.org, Google, Yahoo! and Bing have announced the launch of schema.org.  Perhaps the most significant aspect of this announcement is the particular standard they have focused on: namely, microdata.

In the Google announcement, Kavi Goel and Pravir Gupta of Google’s search team say, “Historically, we’ve supported three different standards for structured data markup: microdata, microformats, and RDFa. We’ve decided to focus on just one format for schema.org to create a simpler story for webmasters and to improve consistency across search engines relying on the data.”

Read more

NEXT PAGE >>