Loading...
Loading...

Go to the content (press return)

Medical text analysis for disease prediction assistance

Type of activity
Competitive project
Acronym
TADIA-MED
Funding entity
AGENCIA ESTATAL DE INVESTIGACION
Funding entity code
PID2019-106942RB-C33
Amount
105.270,00 €
Start date
2020-06-01
End date
2023-05-31
Keywords
aprendizaje automático, aprendizaje profundo, clinical text, deep learning, detección de especulación, detección de negación, extracción de información, inferencia de patrones de riesgo, information extraction, machine learning, medical language processing, medical ontologies, multimorbidilidad, multimorbidity, negation detection, ontologías médicas, procesamiento de lenguaje medico, risk parttern inference, speculation detection, texto clínico
Abstract
The large amount of existing electronic health records can be leveraged to improve the efficiency and precision of medical professionals.
Diagnostic assistant systems, or preventive detectors of health risks are nowadays possible thanks to the availability of large enough data repositories and to powerful machine learning techniques. We aim to explore approaches that allow the extraction of patient evolution patterns from clinical histories written in Spanish, Catalan or English. The obtained patterns could be useful for the development of diagnostic assistants or prevention policies.
In order to achieve the aims above described, we will focus on the study of different aspects:
- Medical information extraction from clinical histories. We will focus on three related subobjectives: Medical Entity Recognition, such as
diagnoses, medical procedures, signs/symptoms, drugs, body part, etc.; Medical Entity Codification using different coding systems, such
as CIE10, CIAP2, Snomed and ATC ; and Relation Extraction, such as occurs_in between a diagnosis and a body part. We will explore
joint deep learning architectures and ensemble deep learning architectures, as well as the effects of applying medical word embeddings
learned by using transfer learning techniques, and the effects of combining semi-supervised ML approaches with deep learning to achieve effective models from small training sets.
- Negation and speculation detection. Not all recognized medical entities have the same certainty. In particular, some of them are
speculative, and others are negated. Thus, detecting these cases is a crucial step in order to perform accurate information extraction on
medical text. Different approaches will be explored. On the one hand, the problem can be modelled as a relation detection problem (locate pairs cue-scope) or as a subsequence detection problem similar to Named Entity recognition (neg, pos, speculation classes over medical entities -focus-). On the other hand, although state-of-the-art deep learning methods will be applied, classical machine learning solutions will also be considered.
- Enrichment and approximated term search in medical ontologies. In order to enable the representation of medical concepts in any of the coding systems relevant for the project and for both Spanish and Catalan, we will explore techniques to integrate and enrich existing
medical resources and ontologies (metamap, snomed, UMLS, BioPortal, ). We will also develop efficient similarity-based techniques for
candidate medical concept retrieval that, given a term detected in a medical text (either written with standard grammar or with nonstandard grammar), obtain the most similar ontology entries
- Knowledge discovery for risk prediction in multimorbid patients. Multimorbid patients are highly prevalent in some clinical contexts, such as primary care, but there is little evidence about how to deal with such patients. In collaboration with IDIAP JGol, we aim at automatically inferring patterns by which doctors are able to predict the risk of new diseases for a multimorbid patient given her clinical history. We will focus on the construction of chronological/semantic graphs from the application of the deep models learned to extract medical information from patients histories. After that, we will study different data mining techniques for the inference of generalized disease prevention patterns through traversals in these graphs.
Scope
Adm. Estat
Plan
PLAN ESTATAL DE INVESTIGACIÓN CIENTÍFICA Y TÉCNICA Y DE INNOVACIÓN 2017-2020
Resoluton year
2020
Funcding program
PROGRAMA ESTATAL DE I+D+I ORIENTADA A LOS RETOS DE LA SOCIEDAD
Funding call
RETOS DE INVESTIGACIÓN: PROYECTOS DE I+D+I
Grant institution
Agencia Estatal De Investigacion

Participants