Unit testing is a critical aspect of software development, focusing on verifying the fundamental units of a system in its early stages. In recent years, Large Language Models (LLMs) have demonstrated significant potential in code generation due to their exceptional code comprehension capabilities. Existing research explores various approaches to integrating LLMs with unit testing tasks and makes notable progress. However, these studies are often constrained by limited datasets and a narrow range of models, lacking systematic evaluation and analysis of LLMs performance in the field of unit testing. This gap limits the industry’s ability to comprehensively understand the capabilities of LLMs.
Master’s student Ye Shang from the iSE Lab conducts a large-scale empirical study to explore the actual performance of LLMs in the field of unit testing. The study includes three unit testing tasks: test generation, assertion generation, and test evolution. The study systematically evaluates a wide range of LLMs with different series, architectures, and parameter sizes, utilizing five widely used benchmark datasets and eight code-related evaluation metrics. The analysis examines the performance of LLMs in unit testing from multiple perspectives, including comparisons with existing state-of-the-art approaches, prompt engineering approaches, and other influencing factors. The research presents several meaningful findings, offering practical guidance for effectively utilizing LLMs in industrial unit testing applications and providing directions for future advancements to improve LLM performance in the unit testing domain.
The research, titled "A Large-scale Empirical Study on Fine-tuning Large Language Models for Unit Testing," has been fully accepted by ISSTA 2025, an international conference in the field of software engineering (CCF-A conference), with an acceptance rate of 23 out of 553 submissions. This research is supported by the National Natural Science Foundation of China and the CCF-Huawei Poplar Fund for Software Engineering. Building on this research, an intelligent unit test generation approach incorporating iterative repair has been further developed and integrated into Huawei’s developer tools for practical application.