Linked Data

A Venture Capitalist’s Take on the Internet of Things

photo of Nest ProtectDavid Hirsch, co-founder of Metamorphic Ventures, recently wrote for Tech Crunch, “There has been a lot of talk in the venture capital industry about automating the home and leveraging Internet-enabled devices for various functions. The first wave of this was the use of the smartphone as a remote control to manage, for instance, a thermostat. The thermostat then begins to recognize user habits and adapt to them, helping consumers save money. A lot of people took notice of this first-generation automation capability when Google bought Nest for a whopping $3.2 billion. But this purchase was never about Nest; rather, it was Google’s foray into the next phase of the Internet of Things.” Read more

XSB and SemanticWeb.Com Partner In App Developer Challenge To Help Build The Industrial Semantic Web

Semantic Web Developer Challenge - sponsored by XSB and SemanticWeb.comAn invitation was issued to developers at last week’s Semantic Technology and Business Conference: XSB and SemanticWeb.com have joined to sponsor the Semantic Web Developer Challenge, which asks participants to build sourcing and product life cycle management applications leveraging XSB’s PartLink Data Model.

XSB is developing PartLink as a project for the Department of Defense Rapid Innovation Fund. It uses semantic web technology to create a coherent Linked Data model for all part information in the Department of Defense’s supply chain – some 40 million parts strong.

“XSB recognized the opportunity to standardize and link together information about the parts, manufacturers, suppliers, materials, [and] technical characteristics using semantic technologies. The parts ontology is deep and detailed with 10,000 parts categories and 1,000 standard attributes defined,” says Alberto Cassola, vp sales and marketing at XSB, a leading provider of master data management solutions to large commercial and government entities. PartLink’s Linked Data model, he says, “will serve as the foundation for building the industrial semantic web.”

Read more

Google Releases Linguistic Data based on NY Times Annotated Corpus

Photo of New York Times Building in New York City

Dan Gillick and Dave Orr recently wrote, “Language understanding systems are largely trained on freely available data, such as the Penn Treebank, perhaps the most widely used linguistic resource ever created. We have previously released lots of linguistic data ourselves, to contribute to the language understanding community as well as encourage further research into these areas. Now, we’re releasing a new dataset, based on another great resource: the New York Times Annotated Corpus, a set of 1.8 million articles spanning 20 years. 600,000 articles in the NYTimes Corpus have hand-written summaries, and more than 1.5 million of them are tagged with people, places, and organizations mentioned in the article. The Times encourages use of the metadata for all kinds of things, and has set up a forum to discuss related research.”

The blog continues with, “We recently used this corpus to study a topic called “entity salience”. To understand salience, consider: how do you know what a news article or a web page is about? Reading comes pretty easily to people — we can quickly identify the places or things or people most central to a piece of text. But how might we teach a machine to perform this same task? This problem is a key step towards being able to read and understand an article. One way to approach the problem is to look for words that appear more often than their ordinary rates.”

Read more here.

Photo credit : Eric Franzon

Getty Releases More Linked Open Data: Thesaurus of Geographic Names

Linked Open Data - Getty VocabulariesLast winter, SemanticWeb reported that the Getty Research Institute had released the first of four Getty vocabularies as Linked Open Data. Recently, the Getty revealed that it had unveiled its second. James Cuno wrote, “We’re delighted to announce that the Getty Research Institute has released the Getty Thesaurus of Geographic Names (TGN)® as Linked Open Data. This represents an important step in the Getty’s ongoing work to make our knowledge resources freely available to all. Following the release of the Art & Architecture Thesaurus (AAT)® in February, TGN is now the second of the four Getty vocabularies to be made entirely free to download, share, and modify. Both data sets are available for download at vocab.getty.edu under an Open Data Commons Attribution License (ODC BY 1.0).”

Read more

Yahoo Labs Hopes to Change the Future of Content Consumption

yahooDerrick Harris of GigaOM reports, “When it comes to the future of web content… Yahoo might just have the inside track on innovation. I spoke recently with Ron Brachman, the head of Yahoo Labs, who’s now managing a team of 250 (and growing) researchers around the world. They’re experts in fields such as computational advertising, personalization and human-computer interaction, and they’re all focused on the company’s driving mission of putting the right content in front of the right people at the right time. However, Yahoo Labs’ biggest focus appears to be on machine learning, a discipline that can easily touch nearly every part of a data-driven company like Yahoo. Labs now has a dedicated machine learning group based in New York; some are working on what Brachman calls ‘hardcore science and some theory,’ while others are building a platform that will open up machine learning capabilities across Yahoo’s employee base.” Read more

New Opps For Libraries And Vendors Open Up In BIBFRAME Transition

semtechbiz-10th-125sqOpportunities are opening up in the library sector, both for the institutions themselves and providers whose solutions and services can expand in that direction.

These vistas will be explored in a session hosted by Kevin Ford, digital project coordinator at the Library of Congress at next week’s Semantic Technology & Business conference in San Jose. The door is being opened by the Bibliographic Framework Initiative (BIBFRAME) that the LOC launched a few years ago. Libraries will be moving from the MARC standards, their lingua franca for representing and communicating bibliographic and related information in machine-readable form, to BIBFRAME, which models bibliographic data in RDF using semantic technologies.

Read more

Symplectic Becomes the First DuraSpace Registered Service Provider for the VIVO Project

vivoResearch Information recently reported, “Symplectic Limited, a software company specialising in developing, implementing, and integrating research information systems, has become the first DuraSpace Registered Service Provider (RSP) for the VIVO Project. VIVO is an open-source, open-ontology, open-process platform for hosting information about the interests, activities and accomplishments of scientists and scholars. VIVO aims to support open development and integration of science and scholarship through simple, standard semantic web technologies.” Read more

A Look At LOD2 Project Accomplishments

lod2pixIf you’re interested in Linked Data, no doubt you’re planning to listen in on next week’s Semantic Web Blog webinar, Getting Started With The Linked Data Platform (register here), featuring Arnaud Le Hors, Linked Data Standards Lead at IBM and chair of the W3C Linked Data Platform WG and the OASIS OSLC Core TC. It also may be on your agenda to attend this month’s Semantic Web Technology & Business Conference, where speakers including Le Hors, Manu Sporny, Sandro Hawke, and others will be presenting Linked Data-focused sessions.

In the meantime, though, you might enjoy reviewing the results of the LOD2 Project, the European Commission co-funded effort whose four-year run, begun in 2010, aimed at advancing RDF data management; extracting, creating and enriching structured RDF data; interlinking data from different sources; and authoring, exploring and visualizing Linked Data. To that end, why not take a stroll through the recently released Linked Open Data – Creating Knowledge Out of Interlinked Data, edited by LOD2 Project participants Soren Auer of the Institut für Informatik III Rheinische Friedrich-Wilhelms-Universität; Volha Bryl of the University of Mannheim, and Sebastian Tramp of the University of Leipzig?

Read more

PredictionIO Raises $2.5M for Open Source Machine Learning Platform

predChristopher Tozzi of The VAR Guy reports, “PredictionIO, the open source machine learning platform, has received a big boost with the announcement of $2.5 million in seed funding, which it plans to use to make its automated data interpretation and prediction platform widely available to open source developers. PredictionIO’s goal is to make it easy for developers and companies of all sizes to integrate machine learning —i.e., software that can interpret data intelligently to make automated decisions and predictions—into their products. ‘PredictionIO aims to be the Machine Learning server behind every application,’ according to the company. ‘Building Machine Learning in software will be as common as search soon with PredictionIO’.” Read more

LODLAM Training Day at Semantic Technology & Business Conference

LODLAM: LinkedOpen Data in Libraries, Archives, and MuseumsAmong the many exciting activities at the 10th Annual Semantic Technology & Business Conference (#SemTechBiz) is the partnership with the Linked Open Data in Libraries Archives, and Museums (LODLAM) Community. On Tuesday, August 19, 2014, LODLAM will hold a full day of trainings at the SemTechBiz Conference in San Jose, California.  Registration information is available here.

We spoke to Jon Voss, Co-Founder of the International LODLAM Summit, about the Training Day:

SemanticWeb.com: What is the LODLAM Training Day?

Photo of Jon VossJon Voss: The LODLAM Training Day is an all-day, hands-on workshop led by practitioners of Linked Open Data in libraries, archives and museums from around the world.

SW: What can people expect to learn?

JV: We’ve broken the day down into two sections, basically: publishing data and reusing data.  The first part of the day we’ll look at ways that libraries, archives and museums are putting massive amounts of structured data online for the public good, and what techniques and tools you can use to do it.  The second part of the day we’ll be looking at using this data in different ways, how to use SPARQL queries, how to build data into other mashups, how to use open datasets to improve your own data, etc.
Read more

NEXT PAGE >>