The large amount of existing electronic health records can be leveraged to improve the efficiency and precision of medical professionals. Diagnostic assistant systems, or preventive detectors of health risks are nowadays possible thanks to the availability of large enough data repositories and to powerful machine learning techniques. We aim to explore approaches that allow the extraction of patient evolution patterns from clinical histories written in Spanish, Catalan or English. The obtained patterns could be useful for the development of diagnostic assistants or prevention policies. In order to achieve the aims above described, we will focus on the study of different aspects: - Medical information extraction from clinical histories. We will focus on three related subobjectives: Medical Entity Recognition, such as diagnoses, medical procedures, signs/symptoms, drugs, body part, etc.; Medical Entity Codification using different coding systems, such as CIE10, CIAP2, Snomed and ATC ; and Relation Extraction, such as occurs_in between a diagnosis and a body part. We will explore joint deep learning architectures and ensemble deep learning architectures, as well as the effects of applying medical word embeddings learned by using transfer learning techniques, and the effects of combining semi-supervised ML approaches with deep learning to achieve effective models from small training sets. - Negation and speculation detection. Not all recognized medical entities have the same certainty. In particular, some of them are speculative, and others are negated. Thus, detecting these cases is a crucial step in order to perform accurate information extraction on medical text. Different approaches will be explored. On the one hand, the problem can be modelled as a relation detection problem (locate pairs cue-scope) or as a subsequence detection problem similar to Named Entity recognition (neg, pos, speculation classes over medical entities -focus-). On the other hand, although state-of-the-art deep learning methods will be applied, classical machine learning solutions will also be considered. - Enrichment and approximated term search in medical ontologies. In order to enable the representation of medical concepts in any of the coding systems relevant for the project and for both Spanish and Catalan, we will explore techniques to integrate and enrich existing medical resources and ontologies (metamap, snomed, UMLS, BioPortal, ). We will also develop efficient similarity-based techniques for candidate medical concept retrieval that, given a term detected in a medical text (either written with standard grammar or with nonstandard grammar), obtain the most similar ontology entries - Knowledge discovery for risk prediction in multimorbid patients. Multimorbid patients are highly prevalent in some clinical contexts, such as primary care, but there is little evidence about how to deal with such patients. In collaboration with IDIAP JGol, we aim at automatically inferring patterns by which doctors are able to predict the risk of new diseases for a multimorbid patient given her clinical history. We will focus on the construction of chronological/semantic graphs from the application of the deep models learned to extract medical information from patients histories. After that, we will study different data mining techniques for the inference of generalized disease prevention patterns through traversals in these graphs.
PLAN ESTATAL DE INVESTIGACIÓN CIENTÍFICA Y TÉCNICA Y DE INNOVACIÓN 2017-2020
PROGRAMA ESTATAL DE I+D+I ORIENTADA A LOS RETOS DE LA SOCIEDAD