Using LLMs to Generate Automatic Feedback on Object-Oriented Programs with Multiple Classes

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

With the public introduction of Large Language Models (LLMs) like ChatGPT, their possibilities for automatic hint and feedback generation in programming education has become a topic of research interest. However, most of this research focuses on beginner programmers creating small programs. In this thesis, we investigate the capabilities of LLMs to give feedback on OOP-related misconception characteristics in larger code solutions. Due to the lack of available datasets for this purpose, we additionally investigate how well LLMs can generate code solutions for larger programming exercises. Specifically, we use Mistral Large to generate code solutions containing one of six misconception characteristics, taken from previous work. Next, we create a system that generates feedback for code solutions spread over multiple files, and use this system to generate feedback for the code solutions in our dataset. Our results show that the generated feedback is correct and appropriate for high-level beginner students, but that the LLMs used are only able to consistently detect the more general characteristics, and struggle with identifying the more complex ones. Overall, our work extends existing literature by exploring the capabilities of current LLMs with regard to larger, more complex code solutions, and finds that although its performance is decent, extensive prompting is needed to obtain these results, and the LLM capability to detect more complex misconception characteristics is limited.

Keywords

LLMs; automatic feedback generation; buggy code generation; OOP; GPT-4o mini; Mistral Large

Citation