Bob DuCharme recently wrote, “The combination of microdata and schema.org seems to have hit a sweet spot that has helped both to get a lot of traction. I’ve been learning more about microdata recently, but even before I did, I found that the W3C’s Microdata to RDF Distiller written by Ivan Herman would convert microdata stored in web pages into RDF triples, making it possible to query this data with SPARQL. With major retailers such as Walmart and BestBuy making such data available on—as far as I can tell—every single product’s web page, this makes some interesting queries possible to compare prices and other information from the two vendors.” Read more
Posts Tagged ‘structured data’
Barbara Starr of Search Engine Land recently wrote, “Search engines are evolving. Search is not only becoming faster, it’s becoming more predictive and conversational — more like a personal assistant. In the old days, search engine results pages (SERPs) presented little more than a collection of 10 blue links — the results of a search over web documents. These listings typically consisted of the URL along with a “snippet” of text and perhaps some other information. Search engines became quite adept at determining and displaying relevant and readable snippets.” Read more
Jasmine Pennic of HIT Consultant reports, “Healthline, provider of intelligent health information and technology solutions, today launched its HealthData Engine to harness the power of structured and unstructured data to improve outcomes and reduce costs. The new big data analytics platform leverages the company’s market-leading HealthTaxonomy, advanced clinical natural language processing (NLP) technologies and semantic analysis to turn patient data into actionable insights.” Read more
- In a sample of over 12 billion web pages, 21 percent, or 2.5 billion pages, use it to mark up HTML pages, to the tune of more than 15 billion entities and more than 65 billion triples;
- In that same sample, this works out to six entities and 26 facts per page with schema.org;
- Just about every major site in every major category, from news to e-commerce (with the exception of Amazon.com), uses it;
- Its ontology counts some 800 properties and 600 classes.
A lot of it has to do with the focus its proponents have had since the beginning on making it very easy for webmasters and developers to adopt and leverage the collection of shared vocabularies for page markup. At this August’s 10th annual Semantic Technology & Business conference in San Jose, Google Fellow Ramanathan V. Guha, one of the founders of schema.org, shared the progress of the initiative to develop one vocabulary that would be understood by all search engines and how it got to where it is today.
James Kobielus of Info World recently shared his thoughts on the best definition for machine learning. He writes, “Increasingly, the term ‘machine learning’ is… beginning to acquire a catch-all status. Or, at the very least, machine learning has become a convenient handle that today’s data scientists use to refer to the wide range of leading-edge techniques for automating knowledge and pattern discovery from fresh data, much of it unstructured. People’s working definitions of machine learning seem to be creeping into broader, vaguer territory. That’s my impression from reading the recent article “Learning and Teaching Machine Learning: A Personal Journey.” In it, author Joseph R. Barr of San Diego State University and True Bearing Analytics discusses both the history of machine learning and his own education in the topic. He states that ‘it’s safe to regard machine learning, data mining, predictive analysis, and advanced analytics as more or less synonymous’.” Read more
RALEIGH, NC and SAN JOSE, CA – May 20, 2014 - TopQuadrant™, a leading semantic data integration company, and Smartlogic, a content intelligence company, today announced a partnership to integrate both parties’ capabilities for linking structured and unstructured data. This strategic alliance will include technology exchange, joint product development and sales collaboration to provide a semantically enabled solution that unifies diverse information across the enterprise.
Overcoming Challenges of Siloed Data (and Thinking)
“One of the ongoing challenges to realizing the insights in big data is that it sits in separate silos – data warehouses, content stores, information feeds and social media, and represents the everyday interaction of human minds,” said Jeremy Bentley, CEO, Smartlogic. “With TopQuadrant’s proven expertise in data virtualization and Smartlogic’s content intelligence, this alliance will deliver a unified view over all the information relevant to the enterprise, regardless of location or type.” Read more
In the winter of 2012, The New York Times began its implementation of the schema.org compatible version of rNews, a standard for embedding machine-readable publishing metadata into HTML documents, to improve the quality and appearance of its search results, as well as generate more traffic through algorithmically generated links. The semantic markup for news articles brought to its web pages structured data properties to define author, the date a work was created, its editor, headline, and so on.
But according to a leaked New York Times internal innovation report that appears here, there’s more work to be done in the structured data realm as part of a grand plan to truly put digital first in the face of falling website and smartphone app readership and hotter competition from both old guard and new age newsrooms and social media properties that are transforming how journalism is delivered for an audience increasingly invested in mobile, social, and personalized technologies.
The report was put together with insights from parties including Evan Sandhaus, director for search, archives and semantics at The NY Times, who was instrumental in the rNews/schema.org effort as well as the TimesMachine relaunch, a digital archive of 46,592 issues of The New York Times whose use includes surrounding current news stories with context. While the report notes that the Gray Lady has not been standing still in the face of its challenges, citing newsroom advances to grow audience with efforts such as using data to inform decisions, it needs to do more – faster – to make it easy to get its content in front of digital readers.
Search, Content Analytics, Structured Data Management Have Hand In Growth Of WorldWide Software Market
IDC this week released the latest results from its Worldwide Semiannual Software Tracker, which provides total market size and vendor share for all software technology areas. In 2013, the tracker reports, the worldwide software market grew 5.5 percent year over year to a total market size of $369 billion.
None of the three primary segments that comprise the total software market in IDC’s software taxonomy – Applications; Application Development & Deployment (AD&D); and Systems Infrastructure software – had a standout performance, it says.
But function-specific types of software in these primary segments did. Among these headline acts, the Content Applications subset of the Applications primary market segment had year-over-year growth rates above 10 percent. That market, IDC says, is driven by Search and Content Analytics applications, which grew at 13.2 percent year over year. The Big Data and analytics adoption trend was largely responsible for this market growth, it says.
Yesterday, the Google Webmaster Central blog reported, “We are launching support for schema.org markup to help you specify your preferred phone numbers using structured data markup embedded on your website. Four types of phone numbers are currently supported: Customer service; Technical support; Billing support; Bill payment. For each phone number, you can also indicate if it is toll-free, suitable for the hearing-impaired, and whether the number is global or serves specific countries. Learn how to specify your national customer service numbers.” Read more
Sean O’Neill of Tnooz reports, “Last week saw the soft launch of Hopper, the long-awaited consumer trip planning engine that claims to be powered by the ‘world’s largest structured database of travel information’. Since last summer, the site has put wannabe users on a waiting list, allowing only a handful to become beta testers. But as of now, the bouncer’s gone. Anyone can create an account, road-test tools, and book flights. Founded in 2007 and based in Boston and Montreal, the company has 23 full-time employees and has received more than $22 million in funding from backers such as Brightspark, Atlas Venture, and OMERS Ventures. It claims to have breakthrough semantic search technology.” Read more
NEXT PAGE >>