The Power of Large Language Models: using OpenAI’s ChatGPT for Automatic Patch Generation

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

Automated Program Repair (APR) has emerged as a valuable tool for developers in the software development and maintenance process. Despite recent advances in deep learning (DL), the DL-based APR approaches still have limitations. A notable research gap exists in the current state-of-the-art (APR) methods, as they often require domain specific knowledge and retraining when transitioning to different programming languages. This study explores the potential of Large Language Models (LLMs), specifically ChatGPT, as a promising alternative for patch generation, as they can potentially overcome these limitations by not requiring domain knowledge and enabling seamless adaptation across different programming languages. The experiment focuses on exploring the potential of ChatGPT as a method for generating software patches. Specifically, we investigate its performance using the benchmark Defects4j v2.0, conducting tests on a total of 476 bugs. We assume perfect localization of the buggy lines for the purpose of the experiment. In our analysis, we compare the results of the ChatGPT-based patch generation with other state-of-the-art APR methods. Our findings reveal that ChatGPT demonstrates a comparatively weaker performance in this context. However, despite its current limitations, our study highlights untapped potential within ChatGPT and other Large Language Models (LLMs). With ongoing advancements and improvements, it is plausible that LLMs may surpass existing methods and offer superior performance in the future. However, LLMs like ChatGPT need further improvements and refinements to fully realize their potential.

Keywords

ChatGPT; Large Language Model; Automatic Program Repair; Patch Generation

Citation