Articles

Wikimeta Project’s Evolution Includes Commercial Ambitions and Focus On Text-Mining, Semantic Annotation Robustness

Wikimeta, the semantic tagging and annotation architecture for incorporating semantic knowledge within documents, websites, content management systems, blogs and applications, this month is incorporating itself as a company called Wikimeta Technologies.  Wikimeta, which has a heritage linked with the NLGbAse project, last year was provided as its own web service.

Dr. Eric Charton, Ph.D, MSc at École Polytechnique de Montréal, is project leader and author of the Wikimeta code. The NLGbAse project was conducted by Charton at the University of Avignon as part of his Ph.D. Thesis.  The Semantic Web Blog recently hosted an email discussion with him to learn more about the Wikimeta architecture and its evolution.

 

The Semantic Web Blog: Tell us about the NLGBase project and Wikimeta’s relationship to it.

Charton: NLGbAse is an ontology extracted from Wikipedia. It is used in Wikimeta as a resource for semantic disambiguation. For each Wikipedia document (aka Semantic Concept), NLGbAse provides various ways of word-writing (for example, “General Motors” can be written “GM Company”, “GM”, “General Motors Corp” and so on), used for detection.

Read more

Semantic Tech & Business Conference Returns to San Francisco

Semantic Tech & Business Conference returns to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. Sign up now!

SemTech Berlin 2012 Conference Explorer App Gives a Taste of Linked-Data-As-A-Service

Want to have a peek into the semantic applications that can result when the cloud and Linked -Data-as-a-service join up? Start with a trip here, where you’ll find the SemTech Berlin 2012 Conference Explorer (among other event explorers). It lets attendees browse through conference metadata – and more – to help them plan for next week’s event.

The application was built with fluid Operations’ Information Workbench, which is a platform for building self-service Linked Data cloud apps; the company also provides the eCloudManager Product Suite, for public and private cloud management. With Information Workbench, users can get over some of the challenges of making Linked Data useable, such as automatically discovering and integrating data sources, dealing with heterogeneity in data sets and access, and planning end-user oriented interfaces and interaction paradigms, says Peter Haase,  a senior architect at fluid Operations, who will be speaking about Linked Data-as-a-service at SemTech Berlin.

Read more

Digital Reasoning To Give Users New Tool For “Learning” Custom Data Sets

Digital Reasoning, developers of the Synthesys platform for discovering the meaning in unstructured data at scale, has on the roadmap exposing to and packaging up for its customers a simplified version of its internal technology for teaching the system new grammatical structures so that it can quickly understand custom or otherwise specific data sets.

The company has quickly added support for new languages such as Arabic, traditional and simplified Chinese, Farsi and Urdu (with more languages on the way) to Synthesys using the tool. The tool gets the software up to speed on each one in just a few weeks by teaching it the grammatical structure and then letting it go off and figure out what the words mean for its work of transforming unstructured (and structured) data into the underlying facts, entities, relationships, and associated terms.

“In the same way we teach it languages you may have a data set that is highly scientific, for example, and this tool essentially makes it easier for our customers to make Synthesys even more accurate for that specific set of data,” says Dave Danielson, VP of marketing.

Read more

Is Your Business Ready for the Semantic Web?

What makes a business ripe to adopt semantic web technologies? Those engaged in cross-enterprise business processes, in particular where models based on web technologies drive greater collaboration and increased dynamism, are on the list, says Professor Adrian Paschke,  Corporate Semantic Web chair at the institute of computer science at the Freie Universität Berlin and head of the InnoProfile project Corporate Semantic Web.

“That is motivation to apply semantic web technologies because you no longer are working in closed walls where you build your own schema and database model, but you need a flexible semantic model that easily integrates with others,” says Paschke.

Read more

Parse.ly Brings A Dash of Semantics To Online Publishers

Online publishers and other content providers have a new analytics tool to help them understand what their readers care about and use that information to better connect them to their sites’ relevant and compelling content. Launching today is Dash, based on the predictive content analytics platform Parse.ly. The technology crawls every article page for Parse.ly’s publisher-partners, and analyzes, in real time and at scale, the text to identify relevant topics to group related content together. Behind this lies natural language processing technology, which uses language queues hidden inside the text to determine its affiliated topics. To date Dash has extracted over 350,000 unique topics through all the URLs is has crawled during private beta for a healthy taxonomy of topics across the web being consumed by users.

Read more

Smooth As Silk (App) Web Sites

Want web sites to run as smooth as silk? So do the developers behind Silk, who’ve been working the last couple of years to make it easy to apply semantics to create more powerful web sites, with information that can be used more effectively.

Silk, which The Semantic Web Blog previously has covered here and here, now is in the process of testing its WYSIWYG Silk Editor with a select user set, and is slowly inviting more interested parties to get involved. It expects to release it publicly soon. The simplicity of the Silk Editor, says Sander Koppelaar, head of business development, is that it looks very much like familiar environments – think a graphical Wiki – while supporting tagging information on a page, such as the population or capital of Amsterdam, if that were the subject.

“That way you first create pages that are very handy for users because they are built for humans, containing text and images you’d see on a normal web site,” he says. “But more or less without noticing it you build on your data model and can start to use that to create the great overviews and answer actual questions about the data.”

Read more

Common Crawl Founder Gil Elbaz Speaks About New Relationship With Amazon, Semantic Web Projects Using Its Corpus, And Why Open Web Crawls Matter To Developing Big Data Expertise

The Common Crawl Foundation’s repository of openly and freely accessible web crawl data is about to go live as a Public Data Set on Amazon Web Services.  The non-profit Common Crawl is the vision of Gil Elbaz, who founded Applied Semantics and the AdSense technology for which Google acquired it , as well as the Factual open data aggregation platform, and it counts Nova Spivack  — who’s been behind semantic services from Twine to Bottlenose – among its board of directors.

Elbaz’ goal in developing the repository: “You can’t access, let alone download, the Google or the Bing crawl data. So certainly we’re differentiated in being very open and transparent about what we’re crawling and actually making it available to developers,” he says.

“You might ask why is it going to be revolutionary to allow many more engineers and researchers and developers and students access to this data, whereas historically you have to work for one of the big search engines…. The question is, the world has the largest-ever corpus of knowledge out there on the web, and is there more that one can do with it than Google and Microsoft and a handful of other search engines are already doing? And the answer is unquestionably yes. ”

Read more

Stop SOPA Protest Gets Underway With DBpedia.org On Board

Editor’s Update Jan. 19: DBpedia, Wikipedia and company are all back online, while some lawmakers have taken their support for SOPA and PIPA offline. Republican Senators Roy Blunt and Marco Rubio have withdrawn their support for the Protect IP Act, and Representative Lee Terry (R-Neb.), an original co-sponsor of SOPA, also has asked to have his name removed from the bill.

 

It’s Stop Online Piracy Act (SOPA) day. At 8 a.m. EST  OpenLink Software began a 12-hour blackout of the following sites it controls in support of Wikipedia, Reddit and others spearheading the online protest against the legislation:

Founder and CEO of OpenLink Software Kingsley Idehen yesterday directed interested parties to a Linked Data-driven poll for the opportunity to vote on taking this step, and the ayes, so to speak, had it.

Turn to any of the above sites and you’ll see:

Read more

Big Data For Lean Startups, Or, A Poor Man’s Watson

What do big companies have that most emerging businesses don’t have to help them get value from Big Data? Well, to start with, there’s lots of money and a ton of technology resources.

Never fear. At the upcoming Semantic Tech & Business conference in Berlin, Christopher Testa, CTO of startup WhiteBox Inc., plans to give companies with considerably fewer resources than giants like Google and IBM insight into how to use Big Data as a small, lean startup. His guidance will draw from his own past experiences at Google training AdSense; lessons learned studying the development of IBM’s Watson; and his current efforts to apply Big Data principles to create an expert system for amateur radio operator license exams at his own startup, with limited engineering resources. Most recently Testa was head of engineering at Ad.ly, and that will factor into advice about how to run a data center with free and open source solutions, too.

Read more

Lessons Learned On the Road To Linked Data

What’s the path from an XML based e-government metadata application to a linked data version? At the upcoming Semantic Tech & Business Conference in Berlin, the road taken by the Dutch government will be described by Paul Hermans, lead architect of Belgian project Erfgoedplus.be, which uses RDF/XML, OWL and SKOS to describe relationships to heritage types, concepts, objects, people, place and time.

Some 1,000 individual organizations compose the Dutch government, each with their own websites. An effort to employ a search engine a few years ago to spider those different and separate web sites to have one single point of access didn’t work as anticipated. The next step to bring some order was to assign all the documents published on those sites a common kernel of metadata fields, which led to building an XML application to enable a structured approach. Linked Data entered the picture about a year and a half ago.

Read more

NEXT PAGE >>