From Text to Table: Optimizing Prompt Design for Semantic Information Extraction in Dutch Healthcare A Case Study on Mapping Hospital Action Plans for Reducing Waiting Lists to the Menzis Format Using ChatGPT

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

The growing interest in large language models like ChatGPT has opened new possibilities for tasks requiring nuanced language understanding. ChatGPT's adaptability makes it especially suitable for niche applications, even in the absence of task-specific training data. In this context, Dutch health insurer Menzis is exploring AI-driven solutions to reduce administrative workload. Menzis receives hospital-submitted action plans to monitor and address excessive waiting times. These documents are typically written in free-text form with varying style and expression, while Menzis requires the information in a standardized format. This makes automated processing challenging and limits the potential for efficient, scalable analysis. This thesis investigates whether a carefully designed prompt can enable ChatGPT to extract structured, semantically accurate information from such plans. A single prompt was developed based on prompt engineering literature, iteratively refined within the ChatGPT interface, and tested on a dataset of eight synthetic action plans and one real-world plan, each synthetic one was crafted to reflect specific challenges. Evaluation focused on semantic correctness, structural accuracy, hallucination resistance and robustness to variation, using a qualitative approach. The final prompt performed reliably across all test cases, achieving consistent structural formatting and accurate content extraction with no observed hallucinations. Minor variation in phrasing and interpretation occurred but the core information remained intact. These findings suggest that prompt-based extraction is a promising method for automating healthcare-related document processing. While human oversight remains necessary for ambiguous or rare cases, further tuning based on output preferences may reduce the need for manual review, supporting more efficient workflows at Menzis and beyond.

Keywords

Citation