Recommending Good First Issues in GitHub OSS Projects
Authors: *Wenxin Xiao, *Hao He, Weiwei Xu, Xin Tan, Jinhao Dong, and Minghui Zhou Venue: The 2022 IEEE/ACM 44th International Conference on Software Engineering Links: [DOI][BibTeX][PDF][Code] Notes: *Joint First Authors
Attracting and retaining newcomers is vital for the sustainability of an open-source software project. However, it is difficult for newcomers to locate suitable development tasks, while existing "Good First Issues" (GFI) on GitHub are often insufficient and inappropriate. In this paper, we propose RecGFI, an effective practical approach for the recommendation of good first issues to newcomers, which can be used to relieve maintainer burden and help newcomers onboard. RecGFI models an issue with features from multiple dimensions (content, background, and dynamics) and uses an XGBoost classifier to generate its probability of being a GFI. To evaluate RecGFI, we collect 53,510 resolved issues among 100 GitHub projects and carefully restore their historical states to build ground truth datasets. Our evaluation shows that RecGFI can achieve up to 0.853 AUC in the ground truth dataset and outperforms alternative models. Our interpretable analysis of the trained model further reveals interesting observations about GFI characteristics. Finally, we report latest open issues (without GFI-signaling labels but recommended as GFI by our approach) to project maintainers among which 16 are confirmed as real GFIs. Among the 16 confirmed GFIs, two issues have attracted newcomer attention and one has already been resolved by a newcomer.