03 Apr Mayo Clinic and IBM to move beyond EMR's to deliver knowledge at the point of care
Madison – Biomedical informatics researchers at Mayo Clinic and IBM have launched a Web site for the newly founded Open Health Natural Language Processing (NLP) Consortium. The consortium is establishing the open-source space to promote past and current development efforts, including participation in information extraction from electronic medical records.
Mayo Clinic and IBM Healthcare released clinical NLP technologies into the public domain. The site will allow the approximately 2,000 researchers and developers working on clinical language systems worldwide to contribute code and further develop the systems. Additionally, the VA Boston Healthcare System and Seattle Group Health have strongly indicated their support of he concept according to IBM.
“We are inviting our international colleagues to help continue development of these valuable tools,” says Christopher Chute, M.D., Dr.P.H., Mayo Clinic bioinformatics expert and senior consultant on the project. “By making it an open-source initiative, we hope to enable wide use of these NLP tools so medical advancements can happen faster and more efficiently.”

“The American Recovery and Reinvestment Act signed by President Obama includes over $17 billion to modernize and extend the electronic medical record across the country,” said Dan Pelino, IBM’s general manager, Global Healthcare & Life. Health.
“NLP programs are With them, it becomes an efficient and valuable diagnostic and scientific resource. Clearly the administration views the digital record and efficient use of this data as an integral part of their health policy and reform.”
Pelino added, “Adoption of this technology will provide physicians with insights into each patient’s condition, allowing them to electronically retrieve the exact knowledge they seek from patient health records rather than reading through every record provided, as they must do today. Patient privacy is a main concern and consideration; all current and any future required safeguards and regulations would be adhered to strictly.”
Will NLP actually save money?¨
“Yes,” said Pellino. “Not only will it shrink the time it takes to search through paper files or search electronic records manually, it will save money by allowing doctors to make the right diagnosis the first time, thus getting the right treatment to an individual faster. It will also save money by advancing the science that improves the actual therapies. Natural Language Processing allows doctors to compare notes with many others and share experience and knowledge that is currently difficult to extract from huge amounts of non-structured data.”
“For instance, a medical specialist may have seen this same condition your doctor is looking at now – and determined an effective treatment for it ten years ago, but because it’s buried in stored notes the knowledge can’t help anyone. NLP will help keep doctors from re-learning what others have already discovered but didn’t have opportunity to pass on. “
“The success of such reforms rides on delivering interconnected and intelligent information to health care professionals everywhere, Mayo and IBM are tapping into the collaborative power of the open-source community to speed the development of Natural Language Processing (NLP). “
NLP is a relatively new and specialized area within computer science dealing with computational methods for understanding human language. In medicine, clinical NLP systems process the vast repositories of text generated by patient-clinician interactions. Such systems categorize and structure it according to standard nomenclature — in this case focusing on terms used in a range of medical specialties — that will ultimately speed data searches for both diagnoses and medical research. These NLP platforms or “pipelines” aid indexing and searching electronic medical records within institutions to quickly find similar cases or conditions, so physicians are not reliant solely on their own clinical experience in analyzing a problem. Researchers may also use these tools to aid retrospective epidemiological studies or do groundwork for new clinical trials.
What does it mean for patients?
“Patients will benefit from the knowledge and expertise beyond that of their own physician. This tool will increase the efficiency and coordination of care, possibly preventing medical errors and providing more effective treatment options,” said Pelino
“Large-scale information extraction from the clinical narrative is a vital component in advancing translational research and patient care,” adds Guergana Savova, Ph.D., medical informatics specialist and Mayo’s NLP lead on the project. “It ‘unlocks’ the clinical textual data that resides in huge repositories. Such technology would allow for large-scale data aggregation, analyses and usage — just imagine the power of data from millions of patients.”
“There is a treasure trove of historical unstructured data that provides essential information for the study of disease progression, treatment effectiveness and long-term outcome which NLP systems make available to clinicians and researchers,” states Anni Coden Ph.D., IBM’s NLP principal on the project.
“Such data can provide guidance for prospective studies and furthermore facilitate the integration of data from multi-modal data sources.”
As health care and academic medical centers adopt electronic medical records, searching and extracting information from them in an automated fashion is becoming more critical to delivering knowledge at the point of care.
Use as diagnostic tool:
Pelino told WTN News, “This tool will allow doctors to mine the medical records in their specialty practice to find similar cases to study and compare before making difficult diagnoses or before determining treatments. Instead of consulting one colleague or depending solely on their own personal experience, they’ll be able to review any physician notes or information about similar cases.”
The Mayo Clinic and IBM jointly developed a system for extracting information from more than 25 million free-text clinical notes based on IBM’s open-source Unstructured Information Management Architecture (UIMA). As part of the system, developers build strings of “annotators” that become a pipeline, allowing physicians to mine the text for references of specific conditions, drugs, diseases, signs and symptoms; anatomical areas or organs; or treatment procedures.
IBM and Mayo Clinic have also developed a system to extract cancer disease characteristics from unstructured pathology reports to facilitate “consistent retrieval and transmission of cancer cases.” The system extracts tumor characteristics, lymph node status and metastatic disease information enabling the automatic computation of cancer stage, which is critical to determine optimal treatment.
Research tool:
“This application of technology will allow medical researchers to search and find similar examples of cases on which to base future studies, including clinical trials. It may help point out possible correlations or connections between conditions that otherwise might not be apparent. It would also be valuable in developing data for larger scale medical population studies (epidemiology). All of this will add to medical knowledge of disease, which is directly applied to patient care said Pelino.”
The two clinical text solutions released open-source by Mayo Clinic and IBM aim at processing two specific types of notes. Clinical notes describe patient-physician encounters, while pathology reports center around tissue findings. Both options are already adding value for Mayo and its patients:
- Physicians can research past records to examine earlier cases of rare conditions, thereby “conferring” with their colleagues across time to aid diagnosis and treatment decisions.
- Retrospective studies of tissue samples can propel new research findings, as happened with a major breast cancer finding at Mayo in 2008.
- Enhanced ability to mine data and determine potential study factors or participants has already enabled individualized medicine treatments in psychiatric care.
Mayo’s open-source solution, clinical Text Analysis and Knowledge Extraction System (cTAKES), focuses on processing the patient-centric clinical notes. Its low level components, for example the software that discovers sentence and word boundaries, assigns word part-of-speech tags and forms phrases out of the words, are “trained” to understand clinical language. The higher-level information extraction components, for example the ones that determine which textual spans are highly relevant to the clinical meaning of a note, are specifically designed for this domain.
cTAKES functionalit recognizes whether a clinical concept is negated, relevant to the patient or to the patient’s family, which are attributes critical to understanding patient-centered medical language.
IBM’s medKAT systems (medical Knowledge Analysis Tool) is a UIMA-based, modular and flexible system that uses advanced NLP techniques to extract structured information from unstructured data sources, such as pathology reports, clinical notes, discharge summaries and medical literature.
medKAT/P is a version customized for the pathology domain, based on a representation of cancer, its characteristics and disease progression. The system recognizes concepts such as primary tumor and its associated attributes (e.g. histology, anatomical site, etc.) or lymph node status and its associated attributes (e.g. number of positive and excised nodes) by identifying mentions (e.g. histology or anatomical sites) and their relations (including negation). medKAT can be viewed as a development platform that is adaptable to user and domain requirements.
“This system has been designed to operate within institutional systems or databases of any size,” said Pellino. “Patient Privacy is a main concern and consideration; all current and any future required safeguards and regulations will be adhered to strictly.