Context-aware error detection for relational datasets using Large Language Models

Detecting erroneous values in datasets remains a challenging and time-consuming task. Different types of errors could occur in each dataset. Errors could either be syntactic, in which case they do not conform to the structure or domain of other values, or semantic, where values are syntactically correct but appear in the wrong context. The variability of the context in which these errors occur makes it hard to design a tool to detect all errors in all contexts. Methods exist to address this problem, but the need to account for context when detecting errors is still challenging, often relying on expensive human intervention. In this research, we developed a new tool that leverages the context awareness of Large Language Models (LLMs) to perform context-aware detection of semantic and syntactic errors. By pruning datasets to optimize the size and quality of input, and employing prompt engineering designed for error detection, the tool extends the range of detectable syntactic and semantic errors by detecting errors that cannot be detected otherwise.

Keywords

Dataset Context; Syntactic Structure; Semantics; Large Language Models (LLMs)

URI

https://studenttheses.uu.nl/handle/20.500.12932/50849

Context-aware error detection for relational datasets using Large Language Models

Files

Publication date

Authors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI