PhD candidate AI-driven FAIR data extraction and harmonization

Education:: WO
: 36 hours a week
Salary:: € 3.017 - € 3.824

Closing date: March 23

Posted on March 4, 2025

Do you want to empower the future of healthcare by turning free text into structured, machine-readable information that can accelerate scientific discovery and transform the clinic? Join us in a pioneering PhD project that tackles one of the biggest challenges in healthcare and population research: extracting and harmonizing disparate data to make it FAIR (Findable, Accessible, Interoperable, Reusable).

Job description

Data harmonization: develop methods to map free-text clinical data to standardized coding systems and ontologies, ensuring compliance with FAIR principles.
AI model innovation: select, adapt, and refine large language models (local, cluster, or cloud-based) and frameworks (Ollama, OntoGPT, LangChain, etc) for automated data recoding.
Prompt and agentic workflow engineering: devise and implement best practices for improving language model performance in data extraction and ontology mapping.
Use case development: Collaborate with researchers and clinicians to apply your solutions in real-world scenarios, such as integrating rare disease alerts into EHRs or re-analyzing existing cohorts.
Interdisciplinary collaboration: Work across data science, software engineering genomics, and clinical teams to create scalable solutions that enhance patient care and research outcomes.

Project AI-driven FAIR data extraction and harmonization
By converting clinical notes and cohort variables into standard coding systems you will help create sufficiently large datasets for automated analysis and advanced diagnostics. Imagine helping rare disease patients by mapping textual symptom descriptions to precise phenotypic codes, which then combine with genomic data to identify potential causative variants. Or envision scaling your methods to unify data from multiple large cohort studies to research healthy child development, by seamlessly integrating local data models with emerging APIs such as DataSHIELD, Beacon or FAIR Data Point to create discoverability and analysis, and build new global collaborations.

Your research will focus on leveraging state-of-the-art Large Language Models to drive this conversion process, driven by many open questions. Which model types and sizes are most effective? How should they be prompted, orchestrated, and validated for optimal accuracy? Could we deploy them locally on our own cluster, or should we tap into cloud resources? Can we enable our partner universities and hospitals to run them locally in a federation? You will experiment with existing agentic frameworks like Ollama, LangChain, and OntoGPT to discover and refine best practices.

You will develop novel methods that will have a direct real-world impact: from improving patient diagnoses and enabling large scale anonymized data reuse for research, to laying groundwork for deeper integration with electronic health records for healthcare mainstreaming. The UMCG is a world-leader in terms of integrating AI in healthcare processes and we will leverage this position in this project to achieve global impact. Join our team of forward-thinking researchers and clinicians to shape the future of AI-driven data extraction and harmonization for healthcare.

Working environment

The position is part of the Genomics Coordination Centre (GCC), the ‘big data science’ research & service hub of the University Medical Centre Groningen (UMCG) and University of Groningen (rank 66 worldwide, 3rd best place to work in EU), hosted by the Department of Genetics. Our mission is to accelerate scientific discovery in health data with innovative methods and tools that expedite medical research and improve people's lives, using open source software and large computer ‘clouds’, in particular the MOLGENIS software that we lead, but also DataSHIELD, Singularity, RedCap, XNAT, OpenStack etc.

What do we need

Master’s degree in AI, Computer Science, Bioinformatics, or a related field.
Passion for machine learning, natural language processing, and biomedical data.
Strong analytical skills and a willingness to learn new techniques.
Excellent communication skills and a collaborative mindset.
Familiarity with relevant technologies are a big plus (e.g. ontologies, coding systems, FAIR data principles, agentic AI frameworks, programming in R, Java, or Python).

What do we offer

A dynamic research environment at the forefront of AI-driven healthcare innovation.
Access to diverse, real-world medical data sets and cutting-edge computational resources.
Support and collaboration with MOLGENIS large open source scientific software team to help you deploy and test your methods in working solutions.
Mentorship by leading experts in AI, genomics, and clinical informatics.
Opportunities to publish in high-impact journals and present at international conferences.

This is a full-time PhD contract for 4 years in an excellent environment for further development. First, a temporary one-year position will be offered with the option of renewal for another 3 years. Your salary will be a minimum of € 2.901,- gross per month in the first year and a maximum of € 3.677,- (scale PhD) in the final (4th) year, based on a full-time appointment. In addition, the UMCG will offer you 8% holiday pay, and 8.3% end-of-year bonus. The conditions of employment comply with the Collective Labour Agreement for Medical Centres (CAO-UMC).

Apply now and join us in revolutionizing how medical data is utilized in the future of healthcare. We look forward to hearing from you!

For questions about the position

Any questions? Do contact us.

Joeri van der Velde assistent professor

How to apply

Please use the the digital application form at the bottom of this page - only these will be processed. You can apply until 23 March 2025. Within half an hour after sending the digital application form you will receive an email- confirmation with further information.

The UMCG has a preventive Hepatitis B policy. The UMCG can provide you with the vaccination, should it be required for your position. In case of specific professions a ‘Certificate of Good Conduct’ is required.

Apply

Related vacancies

See all vacancies

Not your vacancy?

Check if an open application is possible for you.

Do you have any questions?

We would like it if you would contact us if you have any questions about working at UMCG

PhD candidate AI-driven FAIR data extraction and harmonization

Job description

Working environment

What do we need

What do we offer

For questions about the position

How to apply

Related vacancies

Perfusie Technicus

Phd

Onderzoeker

Not your vacancy?

Do you have any questions?

Footer navigatie

UMCG als werkgever

Vacatures en solliciteren

UMCG

About the site

PhD candidate AI-driven FAIR data extraction and harmonization

Share

Job description

Working environment

What do we need

What do we offer

For questions about the position

How to apply

Related vacancies

Perfusie Technicus

Phd

Onderzoeker

Not your vacancy?

Do you have any questions?

UMCG als werkgever

Vacatures en solliciteren

UMCG