James Kobielus of Info World recently shared his thoughts on the best definition for machine learning. He writes, “Increasingly, the term ‘machine learning’ is… beginning to acquire a catch-all status. Or, at the very least, machine learning has become a convenient handle that today’s data scientists use to refer to the wide range of leading-edge techniques for automating knowledge and pattern discovery from fresh data, much of it unstructured. People’s working definitions of machine learning seem to be creeping into broader, vaguer territory. That’s my impression from reading the recent article “Learning and Teaching Machine Learning: A Personal Journey.” In it, author Joseph R. Barr of San Diego State University and True Bearing Analytics discusses both the history of machine learning and his own education in the topic. He states that ‘it’s safe to regard machine learning, data mining, predictive analysis, and advanced analytics as more or less synonymous’.” Read more
Posts Tagged ‘structured data’
RALEIGH, NC and SAN JOSE, CA – May 20, 2014 - TopQuadrant™, a leading semantic data integration company, and Smartlogic, a content intelligence company, today announced a partnership to integrate both parties’ capabilities for linking structured and unstructured data. This strategic alliance will include technology exchange, joint product development and sales collaboration to provide a semantically enabled solution that unifies diverse information across the enterprise.
Overcoming Challenges of Siloed Data (and Thinking)
“One of the ongoing challenges to realizing the insights in big data is that it sits in separate silos – data warehouses, content stores, information feeds and social media, and represents the everyday interaction of human minds,” said Jeremy Bentley, CEO, Smartlogic. “With TopQuadrant’s proven expertise in data virtualization and Smartlogic’s content intelligence, this alliance will deliver a unified view over all the information relevant to the enterprise, regardless of location or type.” Read more
In the winter of 2012, The New York Times began its implementation of the schema.org compatible version of rNews, a standard for embedding machine-readable publishing metadata into HTML documents, to improve the quality and appearance of its search results, as well as generate more traffic through algorithmically generated links. The semantic markup for news articles brought to its web pages structured data properties to define author, the date a work was created, its editor, headline, and so on.
But according to a leaked New York Times internal innovation report that appears here, there’s more work to be done in the structured data realm as part of a grand plan to truly put digital first in the face of falling website and smartphone app readership and hotter competition from both old guard and new age newsrooms and social media properties that are transforming how journalism is delivered for an audience increasingly invested in mobile, social, and personalized technologies.
The report was put together with insights from parties including Evan Sandhaus, director for search, archives and semantics at The NY Times, who was instrumental in the rNews/schema.org effort as well as the TimesMachine relaunch, a digital archive of 46,592 issues of The New York Times whose use includes surrounding current news stories with context. While the report notes that the Gray Lady has not been standing still in the face of its challenges, citing newsroom advances to grow audience with efforts such as using data to inform decisions, it needs to do more – faster – to make it easy to get its content in front of digital readers.
Search, Content Analytics, Structured Data Management Have Hand In Growth Of WorldWide Software Market
IDC this week released the latest results from its Worldwide Semiannual Software Tracker, which provides total market size and vendor share for all software technology areas. In 2013, the tracker reports, the worldwide software market grew 5.5 percent year over year to a total market size of $369 billion.
None of the three primary segments that comprise the total software market in IDC’s software taxonomy – Applications; Application Development & Deployment (AD&D); and Systems Infrastructure software – had a standout performance, it says.
But function-specific types of software in these primary segments did. Among these headline acts, the Content Applications subset of the Applications primary market segment had year-over-year growth rates above 10 percent. That market, IDC says, is driven by Search and Content Analytics applications, which grew at 13.2 percent year over year. The Big Data and analytics adoption trend was largely responsible for this market growth, it says.
Yesterday, the Google Webmaster Central blog reported, “We are launching support for schema.org markup to help you specify your preferred phone numbers using structured data markup embedded on your website. Four types of phone numbers are currently supported: Customer service; Technical support; Billing support; Bill payment. For each phone number, you can also indicate if it is toll-free, suitable for the hearing-impaired, and whether the number is global or serves specific countries. Learn how to specify your national customer service numbers.” Read more
Sean O’Neill of Tnooz reports, “Last week saw the soft launch of Hopper, the long-awaited consumer trip planning engine that claims to be powered by the ‘world’s largest structured database of travel information’. Since last summer, the site has put wannabe users on a waiting list, allowing only a handful to become beta testers. But as of now, the bouncer’s gone. Anyone can create an account, road-test tools, and book flights. Founded in 2007 and based in Boston and Montreal, the company has 23 full-time employees and has received more than $22 million in funding from backers such as Brightspark, Atlas Venture, and OMERS Ventures. It claims to have breakthrough semantic search technology.” Read more
According to a new post by Mariya Moeva on the Google Webmaster Blog, “Since we launched the Structured Data dashboard last year, it has quickly become one of the most popular features in Webmaster Tools. We’ve been working to expand it and make it even easier to debug issues so that you can see how Google understands the marked-up content on your site. Starting today, you can see items with errors in the Structured Data dashboard. This new feature is a result of a collaboration with webmasters, whom we invited in June to>register as early testers of markup error reporting in Webmaster Tools. We’ve incorporated their feedback to improve the functionality of the Structured Data dashboard.” Read more
Barbara Starr of Search Engine Land recently wrote, “Ever since the Hummingbird update, there has been a ton of Internet buzz about entity search. What is entity search? How does it work? And what exactly is an ‘entity’? However, the topic of entity search as it relates to e-commerce and Google Shopping has been neglected. Everything you have learned to date about entity search, semantic search and the semantic Web also applies to e-commerce. The big difference in the shopping vertical compared to other search verticals is that all entities searched for are of the same type. Every product in Google is, in fact, an entity of type ‘product.’ It should therefore be treated and optimized as such.” Read more
Nathan Safran of Search Engine World recently wrote, “The early days of search required that users query a search engine in query -> database format. That is, to extricate relevant results, the user must phrase the query in such a way that the machine can understand the request, query the database, and return results. Along the way many have claimed to solve this machine language problem, promising users can use natural language processing – the normal everyday language humans use as opposed to the ‘query a database’ language search has traditionally required (remember Ask Jeeves?).” Read more
Last week saw Denny Vrandecic leave Wikidata as director of the project that as of last week passed a milestone of 20 million statements and as of this Monday saw the creation of its fifteen-millionth item, about a Wikipedia category related to beetles. This week also sees Lydia Pintscher, community communications for technical projects including Wikidata, take on the responsibility of product manager for Wikidata.
In a farewell blog post, entitled Data For the People, Vrandecic provided his thoughts about how far Wikidata has come, as well as the possibilities and challenges that lie ahead.
The Semantic Web Blog caught up with Vrandecic, who spoke at the recent SemTech in San Francisco in June, for a little more perspective on the future – Wikidata’s and his own. When it comes to what he’d note as is accomplishments, Vrandecic said that he usually would name the size of the project, citing Wikidata’s community of 3,900 active editors, of whom a third have not been contributors to Wikimedia projects before. “This is, after Wikipedia and Commons, the third-largest Wikimedia project.
NEXT PAGE >>