Posts Tagged ‘Entity Extraction’

Day of the Dolphin: Swim In the Personalized Social Stream With Bottlenose

It’s the Day of the Dolphin. Bottlenose (previously known as Bottleno.se), which we initially covered here, moves out of stealth and into private beta mode. The service lassoes your Twitter, Facebook and Yammer streams, and drives real-time understanding and surfacing of personally relevant content so that you don’t have to read everything (not that you ever could!). It debuts with a new architecture for leveraging “crowd computing” for enabling scale and for creating more and more “semantic stream” smarts around the flood of information on social networks.

Nova Spivack and co-founder and CTO Dominiek ter Heide (formerly CTO of Cerego Japan who has long been tackling the issue of distilling interest profiles behind social streams) are the minds behind the service. Spivack has essentially referred to Bottlenose as everything, and more, that Twitter Annotations never was.

Read more

Semantic Tech & Business Conference Returns to San Francisco

Semantic Tech & Business Conference returns to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. Sign up now!

Semantic Tech’s On The Way to Document Management Systems

Photo credit: Flickr/ Jessica Mullen

Document management as you know it probably isn’t delivering what you’d really like out of it, is it? “The complexity of document management is increasing a lot,” says George Roth, president and CEO of semantic technology integrator and consultancy Recognos Inc., who will be speaking about semantic technology’s impact on document management and all the unstructured data that lies within documents at the approaching Semantic Tech & Business Conference in Washington D.C. ( The event takes place at the end of November.)

“First, the volume of documents people are dealing with is increasing. And searching for information in general takes a lot of time. In different industries, like biotech or legal or finance, when people are doing research, 40 to 60 percent of their time is spent trying to find relevant documents,” he says. Classical tagging and superficial categorization can’t scale. “Keyword searches are actually obsolete at this point because the returned set of results is huge.”

As Roth sees it, if semantic technology isn’t behind your document management system yet, it will be.

Read more

What Have You Liked Today — And What Are You Going To Do About It?

So, how many things have you liked today? Chances are that somewhere in the last 24 hours you’ve given a thumbs-up to a news article you came across on a friend’s Facebook post, a movie on Netflix, or a beer garden on Foursquare.

An application in beta from Cascaad, dubbed CircleMe, hopes to be the single source for hosting and managing all your likes.  “Typically you leave those traces all over the web but they aren’t leveraged,” says Erik Lumer, Cascaad founder and executive chairman. “It’s in your profile somewhere but you’re not getting much out of it.” Lumer says Cascaad is betting there’s value to help users manage the activity on their likes in one place, so that they can get more out of them such as more easily tracking new things underway that are connected to what they already like, or get recommendations from others with similar interests. And to do it with greater permanence, so to speak. As Lumer points out, you can potentially discover a new book on Facebook that one of your friends liked, but “two hours later it’s gone. There are hundreds of messages on top of it. There’s not a clean way to leverage that effectively, so in that sense I think we are very complementary” to Facebook likes.

Read more

Alchemy Aims to Add More API Wizardry

Orchestr8’s NLP- and machine learning-based AlchemyAPI service for analyzing content and extracting semantic metadata has added some new capabilities.

One new feature is dubbed Relation Extraction, which project engineer Shaun Roach tells the Semantic Web Blog “detects sentences describing actions, events, and facts, and then codes them into a machine-readable format.  It is a key feature for developers who want to go a step beyond tagging, to understand specifically how all the people, places, and things mentioned in the document are interacting.”

So, it processes natural language, and converts documents and web pages into actionable, semantically enriched “Subject-Action-Object” data, as the company blog describes it.

Read more

What’s Next For OpenText As It Continues Integration of Nstein’s Technologies?

Since Nstein was acquired by OpenText a little over a year ago, work has been underway to build the former’s semantic technology for text mining and analytics and search into the latter’s enterprise content management platform. So far, that’s resulted in adding Semantic Navigation, the on-premise or cloud web site search and content discovery solution, to OpenText’s Web content management (WCM) products, such as OpenText Web Experience Management and Web Site Management.

This covers aspects such as content tagging and semantic faceting at the content and document levels. This year and the following should see further integration of Nstein technologies into the OpenText solutions set, as well as some new offerings emerging to support other use cases.

As an example, the company is working on a listening platform application, drawing on work Nstein had done for the Canadian government’s public health agency that used its Text Mining Engine to identify potential threats to human health by scouring multiple sources — including news aggregators like Factiva – that were parsed for about 1,000 or so concepts such as “mysterious ailments” and “outbreak.” It’s building up a framework for ingesting different data sources to support this, says Charles-Olivier Simard, product manager for semantic technologies at OpenText.

Read more

Patent, Patent, Digital Reasoning’s Got a Text Discovery Patent

Are you starting to hear more about patents that relate to the Semantic Web space? There was an interesting discussion by Erik Sherman here on Facebook’s patent for automatic search curation as feeding its semantic search ambitions, for instance.

Generally speaking, in fact, patents are big in the news, with the passage last week by the Senate of the Patent Reform Bill, which has among its goals getting patents issued sooner — but which also is spurring concern, especially in the tech industry, about its impact on patent infringement actions.

Against this backdrop, and perhaps flying a bit more under the radar, was a U.S. patent (No. 7,882,055) granted to Digital Reasoning for its distributed system of intelligent software agents for discovering the meaning in text. Company CEO Tim Estes calls what the vendor has applied to its Synthesis technology a “bottom-up” patent.

Specifically, it covers the mechanism of measurement and the applications of algorithms to develop machine-understandable structures from patterns of symbol usage, the company says, as well as the  semantic alignment of those learned structures from unstructured data with pre-existing structured data — a necessary step in creating enterprise-class entity-oriented systems.

So, in plain(er) English, it’s about using algorithms to bootstrap the creation of semantic models from large-scale unstructured data with minimal a priori information – in other words, to let the data speak for itself. It aims at being a fast route to entity-oriented analytics for harvesting critical facts and relationships across a spread of information in documents.

Read more

Time for Semantic ETL?

What’s the link between the trends of more and more objects and even commercial transactions on the web being described in a machine-readable, semantic format and the endless streaming of all that data? Revenue-funded startup First Retail, whose principals Anne Jude Hunt and Simon G. Handley will be speaking at the upcoming Semantic Technology Conference in June, thinks the answer is semantic ETL.

Extract, transform, load (ETL) is a widely known concept in the well-charted terrain of the IT world. That’s about transforming a bunch of heterogeneous data to unify it within a data warehouse and get some use out of it.

Semantic ETL, says Hunt, is brought on by the fact that today people want to deal with the growing loads of streaming data while it’s streaming and that “people want intelligent data, machine-readable tags,[they want] to slice and dice it for BI in lots of different ways, so the  traditional data warehouse and relational database approach is just not working for people.” Cleansed and integrated semantic data loaded into distributed, scalable triple stores can come to the rescue.

Read more

The Spotlight’s on DBpedia

The spotlight’s on DBpedia. Literally. A new open source tool that goes by the name of DBpedia Spotlight annotates mentions of DBpedia resources in text to link unstructured information sources to the Linked Open Data cloud. The idea behind it was to ‘go generic’ so that users could download, adapt and integrate it with their own stacks to meet their specific needs.

That idea started to play out in the community just a day or two after the tool’s release, in fact.  The EuropeanaConnect Media Annotation Prototype is using DBpedia Spotlight to support images, audio and video content semantic tagging and annotation — that’s something that one of DBpedia Spotlight’s creators, Pablo N. Mendes, hadn’t foreseen.

Read more

High Precision Entity Extraction: A U.S. State Department Case Study – SemTech 2009 Audio

Joseph C. Wicentowski, U.S. Department of State
Dan McCreary, Dan McCreary and Associates

The U.S. State Department’s Office of the Historian has embarked on an ambitious effort to migrate its diplomatic history document archive from paper to an enriched electronic media for online consumption. We have extremely high standards for semantic precision and accuracy, due to Congressional mandates, which makes this unique resource useful to a broad audience, which includes scholars, government officials, and the general public. Furthermore, the new format allows us to repurpose our content and integrate it with "mashup" applications such as timelines and geographical map views.

This case study reviews the U.S. State Department’s requirements and the decision process that led us to adopt high-precision semantic markup standards that are supported by our tools as well as by our vendors. We will review our requirements and decision-making, and will show concrete examples of how the precise identifiers for people, locations, and events allow us to enrich the display of our documents online.

We will also review the full document lifecycle and the need for automated but high quality entity extraction tools to minimize document conversion costs. This case study will discuss some of the tradeoffs others may face when advanced technology decisions have both risks and rewards for the digital historian.

In this presentation we will:

  • Review business requirements for a high precision entity extraction application
  • Describe our semantic approach
  • Demonstrate entity extraction
  • Demonstrate timeline and other mashups
  • Summarize project benefits

Attachment: High Precision Entity Extraction – A US State Department Case Study.mp3 (54.54 MB)

Presenters:

Joe Wicentowski
Joe Wicentowski

After completing a Fulbright grant in Asia for his doctoral research and receiving his Ph.D. in History from Harvard University, Joseph C. Wicentowski joined the U.S. Department of State’s Office of the Historian. He has taken a leadership role in digital history management as a digital historian, developing new digital formats for the Department’s archive of U.S. diplomatic and foreign affairs documents, which reach back to the founding of the historian’s office in 1861. He has led development of a new website for these documents, based on a native XML database, and is working to bring the benefits of data visualization, metadata management, and other digital history applications to the federal government and the public. He has particular interests in XML, XQuery, and U.S. and Chinese history.

Dan McCreary
Dan McCreary

Dan is an enterprise data architect/strategist living in Minneapolis. He has worked for organizations such as Bell Labs and Steve Job’s NeXT Computer as well as founding his own consulting firm of over 75 people. He has a background in object-oriented programming and declarative XML languages (XSLT, XML Schema design, XForms, XQuery, RDF, and OWL). He has published articles on various technology topics including the Semantic Web, metadata registries, enterprise integration strategies, XForms, and XQuery. He is author of the XForms Tutorial and Cookbook.

Entity Extraction and the Semantic Web

Entity Extraction is the process of automatically extracting document metadata from unstructured text documents.  Extracting key entities such as person names, locations, dates, specialized terms and product terminology from free-form text can empower organizations to not only improve keyword search but also open the door to semantic search, faceted search and document repurposing.  This article defines the field of entity extraction, shows some of the technical challenges involved, and shows how RDF can be used to store document annotations. It then shows how new tools such as Apache UIMA are poised to make entity extraction much more cost effective to an organization.

Read more