Photo credit: Flickr/ Jessica Mullen

Document management as you know it probably isn’t delivering what you’d really like out of it, is it? “The complexity of document management is increasing a lot,” says George Roth, president and CEO of semantic technology integrator and consultancy Recognos Inc., who will be speaking about semantic technology’s impact on document management and all the unstructured data that lies within documents at the approaching Semantic Tech & Business Conference in Washington D.C. ( The event takes place at the end of November.)

“First, the volume of documents people are dealing with is increasing. And searching for information in general takes a lot of time. In different industries, like biotech or legal or finance, when people are doing research, 40 to 60 percent of their time is spent trying to find relevant documents,” he says. Classical tagging and superficial categorization can’t scale. “Keyword searches are actually obsolete at this point because the returned set of results is huge.”

As Roth sees it, if semantic technology isn’t behind your document management system yet, it will be.

Consider, for instance, DM systems where categorization may be done at a very high level: One might put legal agreements under a category dubbed ‘contracts.’ But what to do as the state of those contracts evolve? “One might become bad, for instance, so I need to re-categorize that,” he says, but that’s not generally a dynamic capability today. “Let’s say I have a contract with Company Y, which is in the financial industry and which is very much linked to another company that I know about, but that information is not explicit in the particular contract document. Then this other company goes bankrupt,” Roth puts forth as an example. It sucks up resources – and often expert and highly paid resources, like researchers – to have to manually check for potential problems like that on a regular basis. Extracting information across documents using NLP techniques, to enable live and dynamic document categorization, and flagging, and alarm creations for such conditions, though, could make the challenge easier to deal with. “If these become live documents then I could re-categorize them on the fly in a more granular way to flag potential problems in this contract case and so on,” he says.

In fact, the movement to include semantic capabilities as part of DM systems has already started, he says. He cites as an example Microsoft Sharepoint11 and the vendor’s $1.2 billion buy of search vendor FAST Search awhile back. “The shift to semantic search [for the enterprise] is happening big-time, and I think Microsoft is one of the leaders in this,” Roth says, even if Microsoft isn’t advertising the semantics behind its system.

The wave of the document management future, he thinks, also will include compliance integration. “It will be compliance based on systems being able to extract information,” he says. It will become possible, for instance, to discover and match what may be disconnected pieces of information across internal documents and other sources – say, that a director on a mutual fund’s board also is a member of the board of another company that the fund invests in – with compliance rules that show that relationship to be a violation. “That is the type of compliance rule that can be checked using this new technology,” he says.

There’s a matter of leveraging the open web as part of getting more value out of the information you have in your own systems, of course. “Say you are investing in different companies. What you can do is combine my internal information about this investment, like how much, the shareholders, and so forth. But at the same time I can collect data about this company from the web,” he says. “I can bring in information like lawsuits and all the risk factors. So you can bring data from outside inside, but the caution here is trust, to make sure about what you’re bringing in.”