The health care industry – and the American citizenry at large – has been focused of late on the problems surrounding the implementation of the Affordable Care Act, the federal website’s issues foremost among them. But believe it or not, there are other things the healthcare industry needs to prepare for, among them the October 1, 2014 date for replacing the World Health Organization’s International Statistical Classification of Diseases and Related Health Problems ICD-9 code sets used to report medical diagnoses and inpatient procedures by ICD-10 code sets. ICD-9 uses 14,000 diagnosis codes which will increase to 68,000 in ICD-10, which is a HIPAA (Health Insurance Portability and Accountability Act) code set requirement.
Natural language processing has had the primary role in many solutions aimed at transforming large volumes of unstructured clinical data into information that healthcare IT application vendors and their hospital customers can leverage. But there’s an argument being made that understanding unstructured text of clinical notes that contain a huge stash of information and then mapping them to fine-grained ICD-10 coding schemes requires a combination of NLP, advanced linguistics, machine learning and semantic web technologies, and Amit Sheth, professor of computer science and engineering at Wright State University and director of the Kno.e.sis Center is making them. (See our story yesterday for a look at how the NLP market is evolving overall, including in healthcare.)
“ICD-10 has thousands of codes with millions of possible permutations and combinations. A rule-based approach is not effective to cover the huge number of ICD-10 codes.” Sheth says. Extracting the correct concepts, identifying the relationship between these concepts and mapping them to the correct code is a major challenge, with codes often formed by information from various sections of a clinical document that itself is subject to individual physicians’ style of recording information, among other factors.