Posts Tagged ‘Wikipedia’

DBpedia 2014 Announced

DBpedia logoProfessor Dr. Christian Bizer of the University of Mannheim, Germany, has announced the release of DBpedia 2014. DBpedia is described at dbpedia.org as  “… a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data. We hope that this work will make it easier for the huge amount of information in Wikipedia to be used in some new interesting ways. Furthermore, it might inspire new mechanisms for navigating, linking, and improving the encyclopedia itself.”

The full announcement on the new release is reprinted below with Bizer’s permission.

****************

DBpedia Version 2014 released

1. the new release is based on updated Wikipedia dumps dating from April / May 2014 (the 3.9 release was based on dumps from March / April 2013), leading to an overall increase of the number of things described in the English edition from 4.26 to 4.58 million things.

2. the DBpedia ontology is enlarged and the number of infobox to ontology mappings has risen, leading to richer and cleaner data.

The English version of the DBpedia knowledge base currently describes 4.58 million things, out of which 4.22 million are classified in a consistent ontology (http://wiki.dbpedia.org/Ontology2014), including 1,445,000 persons, 735,000 places (including 478,000 populated places), 411,000 creative works (including 123,000 music albums, 87,000 films and 19,000 video games), 241,000 organizations (including 58,000 companies and 49,000 educational institutions), 251,000 species and 6,000 diseases. Read more

The Web Is 25 — And The Semantic Web Has Been An Important Part Of It

web25NOTE: This post was updated at 5:40pm ET.

Today the Web celebrates its 25th birthday, and we celebrate the Semantic Web’s role in that milestone. And what a milestone it is: As of this month, the Indexed Web contains at least 2.31 billion pages, according to WorldWideWebSize.  

The Semantic Web Blog reached out to the World Wide Web Consortium’s current and former semantic leads to get their perspective on the roads The Semantic Web has traveled and the value it has so far brought to the Web’s table: Phil Archer, W3C Data Activity Lead coordinating work on the Semantic Web and related technologies; Ivan Herman, who last year transitioned roles at the W3C from Semantic Activity Lead to Digital Publishing Activity Lead; and Eric Miller, co-founder and president of Zepheira and the leader of the Semantic Web Initiative at the W3C until 2007.

While The Semantic Web came to the attention of the wider public in 2001, with the publication in The Scientific American of The Semantic Web by Tim Berners-Lee, James Hendler and Ora Lassila, Archer points out that “one could argue that the Semantic Web is 25 years old,” too. He cites Berners-Lee’s March 1989 paper, Information Management: A Proposal, that includes a diagram that shows relationships that are immediately recognizable as triples. “That’s how Tim envisaged it from Day 1,” Archer says.

Read more

Google’s Popping Up Information About Search Result Sources

Google’s Knowledge Graph took on some new work this week, driving popups of information about some of the website sources that users see in their search results.

googresultAccording to a posting at Google’s Search blog, clicking on the name of the information source that appears next to the link delivers details about that source, as in the picture at left. “You’ll see this extra information when a site is widely recognized as notable online, when there is enough information to show or when the content may be handy for you,” reports Bart Niechwiej, the software engineer who wrote up the news.

The feature’s been getting a lot of buzz. Much of the information informing Google’s Knowledge Graph comes from Wikipedia, as well as from Freebase and the CIA World FactBook. And, when it comes to a popup source of information you’re likely to see show up somewhere in most searches’ results, Wikipedia likely will be among them. In fact, observers like Matt McGee over at Search Engine Land have noted about the new feature that “the popups rely heavily on Wikipedia.”

Read more

Deep Neural Network Learns Language from Wikipedia

Wikipedia

San Francisco, CA (PRWEB) December 19, 2013 — Big news for big data: the makers of Ersatz (http://ersatz1.com/), a platform for building “deep neural networks” in the cloud, have fed their algorithm over 4 million Wikipedia articles, and this word cloud is what it learned: http://wordcloud.ersatz1.com.

 

The 3D word cloud combines over 25,000 words into a highly ordered network of associations. To accomplish this task, Ersatz used the patterns of the English language to learn what words go where and how they are most often presented. Read more

Senzari’s MusicGraph APIs Look To Enhance Musical Journeys

MusicGraph image

News came the other week that Senzari had announced the MusicGraph knowledge engine for music. The Semantic Web Blog had a chance to learn a little bit more about it what’s underway thanks to a chat with Senzari’s COO Demian Bellumio.

MusicGraph used to go by the geekier name of Adaptable Music Parallel Processing Platform, or AMP3 for short, for helping users control their Internet radio. “We wanted to put more knowledge into our graph. The idea was we have really cool and interesting data that is ontologically connected in ways never done before,” says Bellumio. “We wanted to put it out in the world and let the world leverage it, and MusicGraph is a production of that vision.”

Since its announcement earlier this month about launching the consumer version on the Firefox OS platform that lets users make complex queries about music and learn and then listen to results, Senzari has submitted its technology to be offered for the iOS, Android, and Windows Mobile platforms.  “You can ask anything you can think of in the music realm. We connect about 1 billion different points to respond to these queries,” he says. Its data covers more than twenty million songs, connected to millions of individual albums and artists across all genres, with extracted information on everything from keys to concept extractions derived from lyrics.

Read more

Furthering the Semantic Web with a Wikipedia of Relevancy

benedetto3

In case you missed it, last week Jim Benedetto, CTO of Gravity, shared an interesting idea on GigaOM for how to push the semantic web forward. He writes, “Everyone is always asking me how big our ontology is. How many nodes are in your ontology? How many edges do you have? Or the most common — how many terabytes of data do you have in your ontology? We live in a world where over a decade of attempted human curation, of a semantic web has born very little fruit. It should be quite clear to everyone at this point that this is a job only machines can handle. Yet we are still asking the wrong questions and building the wrong datasets.” Read more

Hooray For Semantic Tech In The Film Industry

Image courtesy popturfdotcom/Flickr

Image courtesy popturfdotcom/Flickr

The story below features an interview with Kurt Cagle, Information Architect Avalon Consulting, LLC, who is speaking this week at the Semantic Technology And Business Conference in NYC. You can save $200 when you register for the event before October 2.

 

New York has a rich history in the film industry.  The city was the capital of film production from 1895 to 1910. In fact, a quick trip from Manhattan to Queens will take you to the former home of the Kaufman Astoria Studios, now the site of the American Museum of the Moving Image. Even after the industry moved shop to Hollywood, New York continued to hold its own, as evidenced by this Wikipedia list of films shot in the city.

 

semtechnyclogoThis week, at the Semantic Technology & Business Conference, a session entitled Semantics Goes Hollywood will offer a perspective on the technology’s applicability to the industry for both its East and West Coast practitioners (and anyone in between). For that matter, even people in industries of completely different stripes stand to gain value: As Kurt Cagle, Information Architect at Avalon Consulting, LLC, who works with many companies in the film space, explains, “A lot of what I see is not really a Hollywood-based problem at all – it’s a data integration problem.”

 

Here’s a spotlight on some of the points Cagle will discuss when he takes the stage:

 

  • Just like any enterprise, studios that have acquired other film companies face the challenge of ensuring that their systems can understand the information that’s stored in the systems of the companies they bought. Semantic technology can come to the fore here as it has for industries that might not have the same aura of glamour surrounding them. “Our data models may not be completely in sync but you can represent both and communicate both into a single composite data system, and a language like SPARQL can query against both sets to provide information without having to do a huge amount of re-engineering,” Cagle says.

Read more

DBpedia 3.9 Hits The Runway


rsz_dbnew3DBpedia 3.9
is up and going. Word came today from Christian Bizer and Christopher Sahnwaldt that the new release boasts an overall increase in the number of concepts in the English edition from 3.7 to 4 million things, thanks to being based on updated Wikipedia dumps from the spring of 2013.
Other numbers to impress:

Read more

At SemTechBiz, Knowledge Graphs Are Everywhere

Sing along with me to this classic hit from 1980: “Knowledge graphs are everywhere; They’re everywhere; My mind describes them to me.”

Our Daughter’s Wedding’s song Lawn Chairs. But it’s a good description of some of the activity at the Semantic Technology & Business Conference this week, which saw Google, Yahoo and Wikidata chatting up the topic of Knowledge Graphs. On Tuesday, for example, Google’s Jason Douglas provided insight into how the search giant’s Knowledge Graph is critical to meeting a new world of search requirements that’s focused on providing answers and acting in an anticipatory way (see story here), while Wednesday’s closing keynote had Wikimedia Deutschland e.V. project director Denny Vrandecic getting the audience up to date with Wikidata – aka, Wikipedia’s Knowledge Graph For, And By, Everyone.

There are some 280 language versions of Wikipedia for which Wikidata serves as the common source of structured data. Wikidata now has an entity base of more than 12 million items that represent the topics of Wikipedia articles, Vrandecic said during his presentation.

Read more

Wikilinks Corpus: What Will You Do With 40 Million Disambiguated Entity Mentions Across 10 Million-Plus Web Pages?

Last Friday saw the release of the Wikilinks Corpus from Research at Google, 40 million entities in context strong.

As explained in a blog post by Dave Orr, Amar Subramanya, and Fernando Pereira at Google Research, the Big Data set “involves 40 million total disambiguated mentions within over 10 million web pages — over 100 times bigger than the next largest corpus.” The mentions, the post relates, are found by looking for links to Wikipedia pages where the anchor text of the link closely matches the title of the target Wikipedia page. If each page on Wikipedia is throught of as an entity, then the anchor text can be thought of as a mention of the corresponding entity, it says.

Read more

NEXT PAGE >>