Barbara Starr of Search Engine Land recently wrote, “Ever since the Hummingbird update, there has been a ton of Internet buzz about entity search. What is entity search? How does it work? And what exactly is an ‘entity’? However, the topic of entity search as it relates to e-commerce and Google Shopping has been neglected. Everything you have learned to date about entity search, semantic search and the semantic Web also applies to e-commerce. The big difference in the shopping vertical compared to other search verticals is that all entities searched for are of the same type. Every product in Google is, in fact, an entity of type ‘product.’ It should therefore be treated and optimized as such.” Read more
Posts Tagged ‘Big Data’
MIAMI–(BUSINESS WIRE)–Senzari® today announced MusicGraph, the world’s first knowledge engine for music, which will be available as a consumer app across most major mobile platforms, as well as a powerful “graph API” that can be leveraged by developers to enhance their applications with deep musical intelligence. MusicGraph contains over a billion facts that have been organized into a rich music ontology, which includes acoustical and lyrical features, detailed artist, album and song information, as well hundreds of other data points related user preferences. Read more
The media has been reporting the last few hours on the Obama administration’s self-imposed deadline for fixing HealthCare.gov. According to these reports, the site is now working more than 90 percent of the time, up from 40 percent in October; that pages on the website are loading in less than a second, down from about eight; that 50,000 people can simultaneously use the site and that it supports 800,000 visitors a day; and page-load failures are down to under 1 percent.
There’s also word, however, that while the front-end may be improved, there are still problems on the back-end. Insurance companies continue to complain they aren’t getting information correctly to support signups. “The key question,” according to CBS News reporter John Dickerson this morning, “is whether that link between the information coming from the website getting to the insurance company – if that link is not strong, people are not getting what was originally promised in the entire process.” If insurance companies aren’t getting the right information for processing plan enrollments, individuals going to the doctor’s after January 1 may find that they aren’t, in fact, covered.
Jeffrey Zients, the man spearheading the website fix, at the end of November did point out that work remains to be done on the backend for tasks such as coordinating payments and application information with insurance companies. Plans are for that to be in effect by mid-January.
As it turns out, among components of its backend technology, according to this report in the NY Times, is the MarkLogic Enterprise NoSQL database, which in its recent Version 7 release also added the ability to store and query data in RDF format using SPARQL syntax.
The natural language processing (NLP) market is moving ahead at a steady clip. According to the recently released report, Natural Language Processing Market – Worldwide Market Forecast and Analysis (2013 – 2018), the sector is estimated to grow from $3,787.3 million in 2013 to $9,858.4 million in 2018. That’s an estimated 21 percent CAGR.
The report considers the market to factor in multiple technologies — recognition technologies such as Interactive Voice Response, Optical Character Recognition, and pattern and image recognition; operational technologies such as auto coding and classification and categorization technologies; and text analytics and speech analytics technologies; as well as machine translation, information extraction and question-answer report generation.
Driving the uptake, the report notes, is the need to enhance customer experiences, especially in an age when the smartphone rules, and Big Data predominates. Big-time industry adopters of the technology, it cites, are healthcare, banking and financial services, and e-commerce, where a big growth in real-time and unstructured customer data and transaction information can be taken in hand by NLP technology to analyze customer needs and then optimize responses to them, taking out some of the human labor costs of doing so.
Samuel Greengard of Baseline Magazine reports, “Few fields benefit more from advanced analytics than health care. Somewhere between the doctor’s office and the somewhat abstract world of epidemiological data is the real world of patients and outcomes. ‘Today, in the outpatient setting, 25 percent of adverse affects are caused by poor or inadequate follow up of abnormal test results,’ says Carlton Moore, associate professor of medicine at the University of North Carolina School of Medicine. In the past, ‘There had been no simple way to manage the process and improve outcomes’.” Read more
With apologies to Samuel Taylor Coleridge, that’s pretty much the situation many enterprises find themselves in. And it gets harder as more and more documents are stored with and as hard-to-index and hard-to-reuse images. How to address the problem? Data Conversion Laboratory (DCL) is trying to make the job easier with its recent introduction of its Automated Conversion System, which takes documents composed of varying visual quality and imagery and converts them into structured data.
Its technology transforms these documents into searchable XML, with extracted metadata, for storing in and access by content-management and other end-user systems.
There’s money in that open data. A new report from the McKinsey Global Institute finds that machine-readable information that’s made available to others has the potential to generate significant economic value: $3 trillion annually in seven domains, to be exact.
The report, entitled Open Data: Unlocking Innovation And Performance With Liquid Information, sees the potential economic effect unfolding in education, transportation, consumer products, electricity, oil and gas, health care and consumer finance. Data becomes more liquid, the report authors note, when it is open, widely available and in shareable formats, and when advanced computing and analysis can yield from it — potentially in conjunction with proprietary data — novel insights. It doesn’t specifically mention Linked Data, but hones in on government open data platforms – including the Linked-Data infused data.gov.UK, which it cites as having had 1.4 million page views this summer – as critical to the economic good tidings. It records more than 40 countries with open data platforms, and up to 1 million data sets as having been made open by governments worldwide.
Dan Woods of Forbes reports, “After operating in a controlled, stealth beta for an unusually long time, ClearStory Data opened the curtain on its ‘Data Intelligence’ product on Monday, illustrating two powerful lessons that will present a challenge to both big data and BI vendors alike. ClearStory makes use of semantic models to understand the contents of each data source and of domain specific languages and APIs to allow tremendous flexibility in both transforming data and in supporting collaboration. Before addressing how these two techniques perform some usability magic tricks, it is worth pointing out one of the reasons that the stealth mode was so lengthy.” Read more
OXFORD, England, October 30, 2013 /PRNewswire/ — Elsevier, a world-leading provider of scientific, technical and medical information products and services, congratulates the winners of the 2013 Semantic Web Challenge (SWC). Determined by a jury of leading experts in computer semantics from both academia and industry, the winners were announced at the 12th International Semantic Web Conference held in Sydney, Australia, October 21-25. The challenge and allocated prizes were sponsored by Elsevier. Read more
Next week Hadoop World takes place in New York City. The big event follows on the heels of the official gold release last week of Apache Hadoop 2.0, which significantly overhauls the MapReduce programming model for processing large data sets with a parallel, distributed algorithm on a cluster.
Sitting on top of the Hadoop Distributed File System (HDFS), YARN (Yet-Another-Resource-Negotiator) is meant to perform as a large-scale, distributed operating system for big data applications. Multiple apps can now run at the same time in Hadoop, with the global ResourceManager and NodeManager providing a generic system for managing the applications in a distributed way.
Among the YARN-ready applications is Apache Giraph, an iterative graph processing system built for high scalability – and the programming framework that helps Facebook with its Graph Search service of connections across friends, subscriptions, and so on, providing the means for it to express a wide range of graph algorithms in a simple way and scale them to massive datasets. Facebook explained in a post in August that it had modified and used Giraph to analyze a trillion edges, or connections between different entities, in under four minutes.
NEXT PAGE >>