Job description
- Data harmonization: develop methods to map free-text clinical data to standardized coding systems and ontologies, ensuring compliance with FAIR principles.
- AI model innovation: select, adapt, and refine large language models (local, cluster, or cloud-based) and frameworks (Ollama, OntoGPT, LangChain, etc) for automated data recoding.
- Prompt and agentic workflow engineering: devise and implement best practices for improving language model performance in data extraction and ontology mapping.
- Use case development: Collaborate with researchers and clinicians to apply your solutions in real-world scenarios, such as integrating rare disease alerts into EHRs or re-analyzing existing cohorts.
- Interdisciplinary collaboration: Work across data science, software engineering genomics, and clinical teams to create scalable solutions that enhance patient care and research outcomes.
Project AI-driven FAIR data extraction and harmonization
By converting clinical notes and cohort variables into standard coding systems you will help create sufficiently large datasets for automated analysis and advanced diagnostics. Imagine helping rare disease patients by mapping textual symptom descriptions to precise phenotypic codes, which then combine with genomic data to identify potential causative variants. Or envision scaling your methods to unify data from multiple large cohort studies to research healthy child development, by seamlessly integrating local data models with emerging APIs such as DataSHIELD, Beacon or FAIR Data Point to create discoverability and analysis, and build new global collaborations.
Your research will focus on leveraging state-of-the-art Large Language Models to drive this conversion process, driven by many open questions. Which model types and sizes are most effective? How should they be prompted, orchestrated, and validated for optimal accuracy? Could we deploy them locally on our own cluster, or should we tap into cloud resources? Can we enable our partner universities and hospitals to run them locally in a federation? You will experiment with existing agentic frameworks like Ollama, LangChain, and OntoGPT to discover and refine best practices.
You will develop novel methods that will have a direct real-world impact: from improving patient diagnoses and enabling large scale anonymized data reuse for research, to laying groundwork for deeper integration with electronic health records for healthcare mainstreaming. The UMCG is a world-leader in terms of integrating AI in healthcare processes and we will leverage this position in this project to achieve global impact. Join our team of forward-thinking researchers and clinicians to shape the future of AI-driven data extraction and harmonization for healthcare.