A new release out of Freebase reports, “When we publicly launched Freebase back in 2007, we thought of it as a ‘Wikipedia for structured data.’ So it shouldn’t be surprising that we’ve been closely watching the Wikimedia Foundation‘s project Wikidata since it launched about two years ago. We believe strongly in a robust community-driven effort to collect and curate structured knowledge about the world, but we now think we can serve that goal best by supporting Wikidata — they’re growing fast, have an active community, and are better-suited to lead an open collaborative knowledge base. So we’ve decided to help transfer the data in Freebase to Wikidata, and in mid-2015 we’ll wind down the Freebase service as a standalone project. Freebase has also supported developer access to the data, so before we retire it, we’ll launch a new API for entity search powered by Google’s Knowledge Graph.” Read more
Posts Tagged ‘structured data’
Ian Harris of Search Engine Journal recently wrote, “Semantic search gives the industry a chance to go back to basics and provide information rather than force it. Let’s take a look at how to embrace semantics.” First off, Harris suggests thinking like a user: “Simply put, if you’re going to optimize for the user, you need to think like the user. In the world of semantics, keywords just don’t cut it… Take the above example. You can see that semantics for a generic term already highlights a wealth of information that a search engine has matched to the keyword. Imagine you are building up your semantic relevance for your delivery service. Optimization on your website should be geared towards information surrounding that service, not only to gain a ranking within relevant SERPs, but to provide answers relevant to your expertise. This could mean you’re providing information about your delivery service, the logistics of your business, and what’s happening in the industry. If the site only focuses on keywords, opportunities will be missed.” Read more
Bob DuCharme recently wrote, “The combination of microdata and schema.org seems to have hit a sweet spot that has helped both to get a lot of traction. I’ve been learning more about microdata recently, but even before I did, I found that the W3C’s Microdata to RDF Distiller written by Ivan Herman would convert microdata stored in web pages into RDF triples, making it possible to query this data with SPARQL. With major retailers such as Walmart and BestBuy making such data available on—as far as I can tell—every single product’s web page, this makes some interesting queries possible to compare prices and other information from the two vendors.” Read more
Barbara Starr of Search Engine Land recently wrote, “Search engines are evolving. Search is not only becoming faster, it’s becoming more predictive and conversational — more like a personal assistant. In the old days, search engine results pages (SERPs) presented little more than a collection of 10 blue links — the results of a search over web documents. These listings typically consisted of the URL along with a “snippet” of text and perhaps some other information. Search engines became quite adept at determining and displaying relevant and readable snippets.” Read more
Jasmine Pennic of HIT Consultant reports, “Healthline, provider of intelligent health information and technology solutions, today launched its HealthData Engine to harness the power of structured and unstructured data to improve outcomes and reduce costs. The new big data analytics platform leverages the company’s market-leading HealthTaxonomy, advanced clinical natural language processing (NLP) technologies and semantic analysis to turn patient data into actionable insights.” Read more
- In a sample of over 12 billion web pages, 21 percent, or 2.5 billion pages, use it to mark up HTML pages, to the tune of more than 15 billion entities and more than 65 billion triples;
- In that same sample, this works out to six entities and 26 facts per page with schema.org;
- Just about every major site in every major category, from news to e-commerce (with the exception of Amazon.com), uses it;
- Its ontology counts some 800 properties and 600 classes.
A lot of it has to do with the focus its proponents have had since the beginning on making it very easy for webmasters and developers to adopt and leverage the collection of shared vocabularies for page markup. At this August’s 10th annual Semantic Technology & Business conference in San Jose, Google Fellow Ramanathan V. Guha, one of the founders of schema.org, shared the progress of the initiative to develop one vocabulary that would be understood by all search engines and how it got to where it is today.
James Kobielus of Info World recently shared his thoughts on the best definition for machine learning. He writes, “Increasingly, the term ‘machine learning’ is… beginning to acquire a catch-all status. Or, at the very least, machine learning has become a convenient handle that today’s data scientists use to refer to the wide range of leading-edge techniques for automating knowledge and pattern discovery from fresh data, much of it unstructured. People’s working definitions of machine learning seem to be creeping into broader, vaguer territory. That’s my impression from reading the recent article “Learning and Teaching Machine Learning: A Personal Journey.” In it, author Joseph R. Barr of San Diego State University and True Bearing Analytics discusses both the history of machine learning and his own education in the topic. He states that ‘it’s safe to regard machine learning, data mining, predictive analysis, and advanced analytics as more or less synonymous’.” Read more
RALEIGH, NC and SAN JOSE, CA – May 20, 2014 - TopQuadrant™, a leading semantic data integration company, and Smartlogic, a content intelligence company, today announced a partnership to integrate both parties’ capabilities for linking structured and unstructured data. This strategic alliance will include technology exchange, joint product development and sales collaboration to provide a semantically enabled solution that unifies diverse information across the enterprise.
Overcoming Challenges of Siloed Data (and Thinking)
“One of the ongoing challenges to realizing the insights in big data is that it sits in separate silos – data warehouses, content stores, information feeds and social media, and represents the everyday interaction of human minds,” said Jeremy Bentley, CEO, Smartlogic. “With TopQuadrant’s proven expertise in data virtualization and Smartlogic’s content intelligence, this alliance will deliver a unified view over all the information relevant to the enterprise, regardless of location or type.” Read more
In the winter of 2012, The New York Times began its implementation of the schema.org compatible version of rNews, a standard for embedding machine-readable publishing metadata into HTML documents, to improve the quality and appearance of its search results, as well as generate more traffic through algorithmically generated links. The semantic markup for news articles brought to its web pages structured data properties to define author, the date a work was created, its editor, headline, and so on.
But according to a leaked New York Times internal innovation report that appears here, there’s more work to be done in the structured data realm as part of a grand plan to truly put digital first in the face of falling website and smartphone app readership and hotter competition from both old guard and new age newsrooms and social media properties that are transforming how journalism is delivered for an audience increasingly invested in mobile, social, and personalized technologies.
The report was put together with insights from parties including Evan Sandhaus, director for search, archives and semantics at The NY Times, who was instrumental in the rNews/schema.org effort as well as the TimesMachine relaunch, a digital archive of 46,592 issues of The New York Times whose use includes surrounding current news stories with context. While the report notes that the Gray Lady has not been standing still in the face of its challenges, citing newsroom advances to grow audience with efforts such as using data to inform decisions, it needs to do more – faster – to make it easy to get its content in front of digital readers.
Search, Content Analytics, Structured Data Management Have Hand In Growth Of WorldWide Software Market
IDC this week released the latest results from its Worldwide Semiannual Software Tracker, which provides total market size and vendor share for all software technology areas. In 2013, the tracker reports, the worldwide software market grew 5.5 percent year over year to a total market size of $369 billion.
None of the three primary segments that comprise the total software market in IDC’s software taxonomy – Applications; Application Development & Deployment (AD&D); and Systems Infrastructure software – had a standout performance, it says.
But function-specific types of software in these primary segments did. Among these headline acts, the Content Applications subset of the Applications primary market segment had year-over-year growth rates above 10 percent. That market, IDC says, is driven by Search and Content Analytics applications, which grew at 13.2 percent year over year. The Big Data and analytics adoption trend was largely responsible for this market growth, it says.
NEXT PAGE >>