A Large-Scale Empirical Study on Java Library Migrations: Prevalence, Trends, and Rationales

Published in Under Review, 2021

With the rise of open-source software and package hosting platforms, reusing 3rd-party libraries has become a common practice. Due to risks including security vulnerabilities, lack of maintenance, unexpected failures, and license issues, a project may completely remove a used library and replace it with another library, which we call library migration. Despite substantial research on dependency management, the understanding of how and why library migrations occur is still lacking. Achieving this understanding may help practitioners optimize their library selection criteria, develop automated approaches to monitor dependencies, and provide migration suggestions for their libraries or software projects. In this paper, through a fine-grained commit-level analysis of 19,652 Java GitHub projects, we extract the largest migration dataset to-date (1,194 migration rules, 3,163 migration commits). We show that 8,065 projects having at least one library removal and 1,564 (lower-bound) to 5,004 (upper-bound) projects have at least one migration, indicating the prevalence of library migrations. We find that projects with library removals have one removal per 139 commits, and projects with migrations have 2 to 4 migrations in median. We discover that library migrations are dominated by several domains presenting a long tail distribution. Also, migrations are highly unidirectional in that libraries are either mostly abandoned or mostly chosen in our project corpus. A thematic analysis on related commit messages, issues, and pull requests identifies 14 frequently mentioned migration reasons, 7 of which are not discussed in previous work. Our findings can be operationalized into actionable insights for package hosting platforms, project maintainers, and library developers.

Recommended citation: Hao He, Runzhi He, Haiqiao Gu, and Minghui Zhou. A Large-Scale Empirical Study on Java Library Migrations: Prevalence, Trends, and Rationales. Under Review. PDF.

MigrationAdvisor: Recommending Library Migrations from Large-Scale Open-Source Data

Published in ICSE 2021 Demonstration Track, 2021

During software maintenance, developers may need to migrate an already in-use library to another library with similar functionalities. However, it is difficult to make the optimal migration decision with limited information, knowledge, or expertise. In this paper, we present MigrationAdvisor, an evidence-based tool to recommend library migration targets through intelligent analysis upon a large number of GitHub repositories and Java libraries. The migration advisories are provided through a search engine style web service where developers can seek migration suggestions for a specific library. We conduct systematic evaluations on the correctness of results, and evaluate the usefulness of the tool by collecting usage feedback from industry developers. Video:

Recommended citation: Hao He, Yulin Xu, Xiao Cheng, Guangtai Liang and Minghui Zhou. MigrationAdvisor: Recommending LibraryMigrations from Large-Scale Open-Source Data. Accepted by ICSE 2021 Demonstration Track. Acceptance Rate: 37.1% (23/62). PDF.

A Multi-Metric Ranking Approach for Library Migration Recommendations

Published in Proceedings of the 28th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2021), 2021

The wide adoption of third-party libraries in software projects is beneficial but also risky. An already-adopted third-party library may be abandoned by its maintainers, may have license incompatibilities, or may no longer align with current project requirements. Under such circumstances, developers need to migrate the library to another library with similar functionalities, but the migration decisions are often opinion-based and sub-optimal with limited information at hand. Therefore, several filtering-based approaches have been proposed to mine library migrations from existing software data to leverage “the wisdom of crowd,” but they suffer from either low precision or low recall with different thresholds, which limits their usefulness in supporting migration decisions. In this paper, we present a novel approach that utilizes multiple metrics to rank and therefore recommend library migrations. Given a library to migrate, our approach first generates candidate target libraries from a large corpus of software repositories, and then ranks them by combining the following four metrics to capture different dimensions of evidence from development histories: Rule Support, Message Support, Distance Support, and API Support. We evaluate the performance of our approach with 773 migration rules (190 source libraries) that we borrow from previous work and recover from 21,358 Java GitHub projects. The experiments show that our metrics are effective to help identify real migration targets, and our approach significantly outperforms existing works, with MRR of 0.8566, top-1 precision of 0.7947, top-10 NDCG of 0.7702, and top-20 recall of 0.8939. To demonstrate the generality of our approach, we manually verify the recommendation results of 480 popular libraries not included in prior work, and we confirm 661 new migration rules from 231 of the 480 libraries with comparable performance. The source code, data, and supplementary materials are provided at:

Recommended citation: Hao He, Yulin Xu, Yixiao Ma, Yifei Xu, Guangtai Liang and Minghui Zhou. A Multi-Metric Ranking Approach for Library Migration Recommendations. Proceedings of the 28th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2021). Acceptance Rate: 25.5% (42/165). PDF. Slides. 中文版.

An Extensive Study of Independent Comment Changes in Java Projects

Published in Preparing for Another Submission..., 2020

While code comments are valuable for software development, code often has low-quality comments or misses comments altogether, which we call suboptimal comments. Such suboptimal comments create challenges in code comprehension and maintenance. Despite substantial research on suboptimal comments, empirical knowledge about why comments are sub- optimal is lacking, affecting commenting practice and related research. We help bridge this knowledge gap by investigating independent comment changes—comment changes committed in- dependently of code changes—which likely attempt to address suboptimal comments. We collect 23M+ comment changes from 4,410 open-source Java repositories and find that ∼16% of com- ment changes are independent, indicating a considerable amount of comments may be suboptimal. Our thematic analysis of 3,600 randomly sampled independent comment changes provides a two-dimensional taxonomy about what is changed (comment category) and how it changed (commenting activity category). We find some combinations of comment and activity categories have a relatively high frequency although those comments are not a large proportion of all comments; the reason may be that some comments easily become obsolete/inconsistent. By further inspecting extensive related materials for these independent comment changes, and validating it with a survey of 33 developer respondents, we find four reasons for suboptimal comments: belief in future actions, lack of comment guidelines, ineffective use of tools, and legacy. We finally provide implications for project maintainers, researchers, and tool designers.

Recommended citation: Chao Wang, Hao He, Uma Paroma, Darko Marinov, and Minghui Zhou. An Extensive Study of Independent Comment Changes in Java Projects. Preparing for Another Submission...

Understanding Source Code Comments at Large-Scale

Published in Proceedings of the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’19), 2019

Source code comments are important for any software, but the basic patterns of writing comments across domains and programming languages remain unclear. In this paper, we take a first step toward understanding differences in commenting practices by analyzing the comment density of 150 projects in 5 different programming languages. We have found that there are noticeable differences in comment density, which may be related to the programming language used in the project and the purpose of the project.

Recommended citation: Hao He. Understanding Source Code Comments at Large-Scale. In Proceedings of the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019), August 26–30, 2019, Tallinn, Estonia. ACM, New York, NY, USA, 3 pages.

Poster: Retroreflective MIMO communication

Published in Proceedings of the 20th International Workshop on Mobile Computing Systems and Applications, 2019

We propose to design retroreflective MIMO channel based on polarization division multiplexing (PDM), with multiple LCD modulators and photodiode (PD) receivers. LCD shutter works as a bi-state modulator which rotates the polarized light by 0 or 90. With polarizer on each side of LCD, it could retroreflect incoming light or absorb it. The retroreflected light is polarized to the angle of front polarizer, which is imperceptible by human eyes but could be separated using polarizer on PD receivers.

Recommended citation: Yue Wu, Kenuo Xu, Hao He, Zihang Wu and Chenren Xu. Poster: Retroreflective MIMO Communication. In Proceedings of the 20th International Workshop on Mobile Computing Systems and Applications (HotMobile 2019).