Suboptimal Comments in Java Projects: From Independent Comment Changes to Commenting Practices
Authors: Chao Wang, Hao He, Uma Pal, Darko Marinov and Minghui Zhou Venue: ACM Transactions on Software Engineering and Methodology Links: [DOI][PDF][Code]
While code comments are valuable for software development, code often has low-quality comments or misses comments altogether, which we call suboptimal comments. Such suboptimal comments create challenges in code comprehension and maintenance. Despite substantial research on suboptimal comments, empirical knowledge about commenting practices that produce suboptimal comments and reasons that lead to suboptimal comments are lacking. We help bridge this knowledge gap by investigating three kinds of artifacts: (1) independent comment changes (ICCs)—comment changes committed independently of code changes—which likely address suboptimal comments, and (2) written commenting guidelines and (3) comment-related tools, which are often employed to help commenting practice—especially to prevent suboptimal comments. We collect 24M+ comment changes from 4,392 open-source Java repositories and find that ICCs widely exist. The ICC ratio—proportion of comment changes that are ICCs—is ∼15.5%, with 98.7% of the repositories having ICC. Our thematic analysis of 3,533 randomly sampled ICCs provides a three-dimensional taxonomy for what is changed (13 comment subcategories), how it changed (six commenting activity categories), and what factors are associated with the change (three factors). We investigate 600 repositories to understand the prevalence, content, impact, and violations of commenting guidelines. We find that only 15.5% of the 600 sampled repositories have any commenting guidelines. We provide the first taxonomy for elements in commenting guidelines: where and what to comment are particularly important. The repositories with such guidelines have a statistically significantly lower ICC ratio. However, commenting guidelines are not strictly followed: 85.5% of checked repositories have violations. We systematically study how developers use two kinds of tools, comment-checking tools and comment-generating tools, in the 4,392 repositories. We find that the use of Javadoc tool is negatively correlated with the ICC ratio, while Checkstyle is not; and the use of comment-generating tools leads to a higher ICC ratio. To conclude, we reveal issues and challenges in current commenting practice, which help understand how suboptimal comments are introduced. We finally provide implications for conducting research, formulating practices, and improving tools.