These vistas will be explored in a session hosted by Kevin Ford, digital project coordinator at the Library of Congress at next week’s Semantic Technology & Business conference in San Jose. The door is being opened by the Bibliographic Framework Initiative (BIBFRAME) that the LOC launched a few years ago. Libraries will be moving from the MARC standards, their lingua franca for representing and communicating bibliographic and related information in machine-readable form, to BIBFRAME, which models bibliographic data in RDF using semantic technologies.
Posts Tagged ‘search engine’
Schemaless structured document search system SIREn (Semantic Information Retrieval ENgine) has posted some impressive benchmarks for a demonstration it did of its prowess in searching complex nested documents. A blog here discusses the test, which indexed a collection of about 44,000 U.S. patent grant documents, with an average of 1,822 nested objects per doc, comparing Lucene’s Blockjoin capability to SIREn.
The finding for the test dataset: “Blockjoin required 3,077MB to create facets over the three chosen fields and had a query time of 90.96ms. SIREn on the other hand required just 126 MB with a query time of 8.36ms. Blockjoin required 2442% more memory while being 10.88 times slower!”
SIREn, which was launched into its own website and community as part of SindiceTech’s relaunch (see our story here), attributes the results to its use of a fundamentally different conceptual model from the Blockjoin approach. In-depth tech details of the test are discussed here. There it also is explained that while the focus of the document is Lucene/Solr, the results are identically applicable to ElasticSearch which, under the hood, uses Lucene’s Blockjoin to support nested documents.
The Semantic Web Blog also checked in with SindiceTech CEO Giovanni Tummarello to get a further read on how SIREn has evolved since the relaunch to enable such results, and in other respects.
Amy Gesenhues of Search Engine Land reports, “Yummly announced today it will be powering the results for recipe searches performed on DuckDuckGo, the search engine built on protecting the privacy of its users. ‘Yummly’s technology understands recipe search queries and we’ve worked together to create a great recipe instant answers experience,’ said DuckDuckGo founder and CEO Gabriel Weinberg. According to the announcement, DuckDuckGo is now one of over 4,000 developers and companies currently leveraging Yummly.” Read more
Semantic technology can be part of the fun. Over the next couple of days we’ll look at some ways it can chip in. Let’s start with food as you start thinking about the summer BBQs. There are semantic solutions that can help on various fronts here. Edamam, for example, has built a food ontology that classifies ingredients, nutrients and food that it applies to recipes it scrapes from the web with the help of its natural language processing and machine learning functions.
As you’re breaking out the grill, you can break out the smartphone or iPad to search for grilled burger recipes that incorporate tomatoes in the 200 to 400 calorie range, for example, and take your pick of ranch salmon, Portobello mushroom, turkey with spiced tomato chutney or the classic beef with garden vegetables, for instance. “The nutrition information we append to recipes using natural language processing. This translates into people being able to filter recipes by diet/calories/allergies and be a bit more health-conscious this summer,” says Victor Penev, Edamam founder and CEO.
Sunnyvale, CA (PRWEB) April 17, 2014 — Inbenta, the Semantic Search Engine provider, announces it has closed a $2M Series A funding from a group of investors led by “Amérigo Chile Early Stage and Growth”. Amerigo is an international network of technological Venture Capital funds which forms part of Telefónica’s commitment to boosting technological innovation around the world. Inbenta will use the funds to continue to scale out their A.I. based Semantic Search platform for enterprise customer care solutions while expanding operations worldwide. Read more
Many eyes are turning to research being done by SEO optimization vendor Searchmetrics about the virtues of semantic markup. Exploring the enrichment of search results through microdata integration, it says it has analyzed “tens of thousands of representative keywords, and rankings for over half a million domains from our comprehensive database, for the effect of the use of schema.org markup in terms of dissemination and integration type.”
Its study is still underway but so far its initial findings include good news – that is, that semantic markup succeeds:
- Larger domains are more likely to embrace structured data markup, and the most popular markups relate to movies, offers, and reviews. That said, overall, domains aren’t flocking to integrate Schema HTML tags.
Last week news came from SindiceTech about the availability of its SindiceTech Freebase Distribution for the cloud (see our story here). SindiceTech has finalized its separation from the university setting in which it incubated, the former DERI institute, now a part of the Insight Center for Data Analytics, and now is re-launching its activities, with more new solutions and capabilities on the way.
“The first thing was to launch the Knowledge Graph distribution in the cloud,” says CEO Giovanni Tummarello. “The Freebase distribution showcases how it is possible to quickly have a really large Knowledge Graph in one’s own private cloud space.” The distribution comes instrumented with some of the tools SindiceTech has developed to help users both understand and make use of the data, he says, noting that “the idea of the Knowledge Graph is to have a data integration space that makes it very simple to add new information, but all that power is at risk of being lost without the tools to understand what is in the Knowledge Graph.”
Included in the first round of the distribution’s tools for composing queries and understanding the data as a whole are the Data Types Explorer (in both tabular and graph versions), and the Assisted SPARQL Query Editor. The next releases will increase the number of tools and provide updated data. “Among the tools expected is an advanced Knowledge Graph entity search system based on our newly released SIREn search system,” he says.
A9.com is looking for a Senior Software Engineer – Search Relevance in Palo Alto, CA. The post states, “A9.com, headquartered in Palo Alto, CA, is the search technology subsidiary of Amazon.com. We value all the positive traits of a successful startup — creativity, a fast pace, high energy, and a fun environment where everybody and their work is important. These traits, combined with the history of success and the resources of Amazon.com, are a winning combination for the team and for our customers. Our Search Relevance team is looking for strong and talented engineers who can make a difference in the quality of Amazon’s product search. As part of A9′s Relevance team, you’ll participate in all parts of the R&D process, from experimenting with new ideas and exploring new techniques to implementing features used by millions of Amazon.com customers every day.” Read more
October 14, 2013: The world’s leading hotel price comparison website, HotelsCombined, is pleased to announce a new partnership with acclaimed app-based travel planner Desti, to power the new travel planner’s metasearch and hotel comparison functionality.
Desti, an all-in-one travel app which launched earlier this year in the US, accesses HotelsCombined’s technology to extract live rates and availability for hotels. The information, along with review data, is displayed through the Desti app and is accessible for users to browse, click through and book. Read more
Ryan Lawler of TechCrunch reports, “Over the past several years, Docstoc has positioned itself as the go-to place for small and medium-sized businesses to find the documents they need to grow. Today, it’s taking a big step toward making it easier to do that, with the launch of a major redesign to improve the way its customers find and access its content… According to Docstoc CEO Jason Nazar, Docstoc now has more than 20,000 pieces of professional content, including documents, videos, and tutorials that it’s created since then. Read more
NEXT PAGE >>