A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.



Mining GitHub Repository Information using the Official REST API

2 minute read


GitHub provides a (not very convinent and well documented) HTTP API for requesting information from GitHub. We can use for requesting repository information in JSON format. You can apply various search conditions and sort them if necessary. For example, if you want to collect 1000 most starred repositories whose language is Java, you can use the following request.

基于五阶段流水线的RISC-V CPU模拟器实现

16 minute read


RISC-V是源自Berkeley的开源体系结构和指令集标准。这个模拟器实现的是RISC-V Specification 2.2中所规定RV64I指令集,基于标准的五阶段流水线,并且实现了分支预测模块和虚拟内存模拟。实现一个完整的CPU模拟器可以很好地锻炼系统编程能力,并且加深对体系结构有关知识的理解。在开始实现前,应当阅读并深入理解Computer Systems: A Programmer’s Perspective中的第四章,或者Computer Organizaton and Design: Hardware/Software Interface中的有关章节。

Building Event System in Unity3D

3 minute read


When I was developing a simple 3D game using Unity 3D, I found it non-trivial to build an event system that could handle dynamic game events efficiently and elegantly.

隐马尔可夫模型(Hidden Markov Model)

1 minute read


隐马尔可夫模型(Hidden Markov Model, HMM)是一个重要的机器学习模型。直观地说,它可以解决一类这样的问题:有某样事物存在一定的状态,但我们无法得知某个时刻(或位置)它所处在的状态,但是我们有一个参照事物,我们知道这个参照事物在某个时刻(或位置)的状态并认为参照事物的状态和原事物的状态存在联系,那么我们可以使用机器学习来推测原事物最有可能在一个时刻(或位置)处在什么样的状态。也就是说,这是一个基于概率统计的模型。


less than 1 minute read




less than 1 minute read


其实上一篇博文所写的$H(\vec{x},t)​$,就是二维傅里叶变换的求和式,之前的暴力计算法属于二维的离散傅里叶变换(Discrete Fourier Transform, DFT),利用二维的快速傅里叶变换(Fast Fourier Transform, FFT)可以将复杂度从$O(n^4)​$降低到$O(n^2\log{n})​$。


less than 1 minute read




5 minute read





A Minimal 2D Shooter Game Implemented in Java

RISC-V Simulator

A Simple RISC-V CPU Simulator with 5 Stage Pipeline, Branch Prediction and Cache Simulation

Intelligent and Secure Library Migration Recommendation

Library migration is a common development acticity during software evolution. To support this activity, we design a multi-metric ranking algorithm to mine library migrations from large-scale open-source data. We further develop MigrationAdvisor, a demo tool to recommend library migrations. The backend data have been deployed in an internal tool at Huawei.


Poster: Retroreflective MIMO communication

Published in Proceedings of the 20th International Workshop on Mobile Computing Systems and Applications, 2019

We propose to design retroreflective MIMO channel based on polarization division multiplexing (PDM), with multiple LCD modulators and photodiode (PD) receivers. LCD shutter works as a bi-state modulator which rotates the polarized light by 0 or 90. With polarizer on each side of LCD, it could retroreflect incoming light or absorb it. The retroreflected light is polarized to the angle of front polarizer, which is imperceptible by human eyes but could be separated using polarizer on PD receivers.

Recommended citation: Yue Wu, Kenuo Xu, Hao He, Zihang Wu and Chenren Xu. Poster: Retroreflective MIMO Communication. In Proceedings of the 20th International Workshop on Mobile Computing Systems and Applications (HotMobile 2019).

Understanding Source Code Comments at Large-Scale

Published in Proceedings of the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’19), 2019

Source code comments are important for any software, but the basic patterns of writing comments across domains and programming languages remain unclear. In this paper, we take a first step toward understanding differences in commenting practices by analyzing the comment density of 150 projects in 5 different programming languages. We have found that there are noticeable differences in comment density, which may be related to the programming language used in the project and the purpose of the project.

Recommended citation: Hao He. Understanding Source Code Comments at Large-Scale. In Proceedings of the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019), August 26–30, 2019, Tallinn, Estonia. ACM, New York, NY, USA, 3 pages.

An Extensive Study of Independent Comment Changes in Java Projects

Published in Preparing for Another Submission..., 2020

While code comments are valuable for software development, code often has low-quality comments or misses comments altogether, which we call suboptimal comments. Such suboptimal comments create challenges in code comprehension and maintenance. Despite substantial research on suboptimal comments, empirical knowledge about why comments are sub- optimal is lacking, affecting commenting practice and related research. We help bridge this knowledge gap by investigating independent comment changes—comment changes committed in- dependently of code changes—which likely attempt to address suboptimal comments. We collect 23M+ comment changes from 4,410 open-source Java repositories and find that ∼16% of com- ment changes are independent, indicating a considerable amount of comments may be suboptimal. Our thematic analysis of 3,600 randomly sampled independent comment changes provides a two-dimensional taxonomy about what is changed (comment category) and how it changed (commenting activity category). We find some combinations of comment and activity categories have a relatively high frequency although those comments are not a large proportion of all comments; the reason may be that some comments easily become obsolete/inconsistent. By further inspecting extensive related materials for these independent comment changes, and validating it with a survey of 33 developer respondents, we find four reasons for suboptimal comments: belief in future actions, lack of comment guidelines, ineffective use of tools, and legacy. We finally provide implications for project maintainers, researchers, and tool designers.

Recommended citation: Chao Wang, Hao He, Uma Paroma, Darko Marinov, and Minghui Zhou. An Extensive Study of Independent Comment Changes in Java Projects. Preparing for Another Submission...

A Multi-Metric Ranking Approach for Library Migration Recommendations

Published in Proceedings of the 28th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2021), 2021

The wide adoption of third-party libraries in software projects is beneficial but also risky. An already-adopted third-party library may be abandoned by its maintainers, may have license incompatibilities, or may no longer align with current project requirements. Under such circumstances, developers need to migrate the library to another library with similar functionalities, but the migration decisions are often opinion-based and sub-optimal with limited information at hand. Therefore, several filtering-based approaches have been proposed to mine library migrations from existing software data to leverage “the wisdom of crowd,” but they suffer from either low precision or low recall with different thresholds, which limits their usefulness in supporting migration decisions. In this paper, we present a novel approach that utilizes multiple metrics to rank and therefore recommend library migrations. Given a library to migrate, our approach first generates candidate target libraries from a large corpus of software repositories, and then ranks them by combining the following four metrics to capture different dimensions of evidence from development histories: Rule Support, Message Support, Distance Support, and API Support. We evaluate the performance of our approach with 773 migration rules (190 source libraries) that we borrow from previous work and recover from 21,358 Java GitHub projects. The experiments show that our metrics are effective to help identify real migration targets, and our approach significantly outperforms existing works, with MRR of 0.8566, top-1 precision of 0.7947, top-10 NDCG of 0.7702, and top-20 recall of 0.8939. To demonstrate the generality of our approach, we manually verify the recommendation results of 480 popular libraries not included in prior work, and we confirm 661 new migration rules from 231 of the 480 libraries with comparable performance. The source code, data, and supplementary materials are provided at:

Recommended citation: Hao He, Yulin Xu, Yixiao Ma, Yifei Xu, Guangtai Liang and Minghui Zhou. A Multi-Metric Ranking Approach for Library Migration Recommendations. Proceedings of the 28th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2021). Acceptance Rate: 25.5% (42/165). PDF. Slides. 中文版.

MigrationAdvisor: Recommending Library Migrations from Large-Scale Open-Source Data

Published in ICSE 2021 Demonstration Track, 2021

During software maintenance, developers may need to migrate an already in-use library to another library with similar functionalities. However, it is difficult to make the optimal migration decision with limited information, knowledge, or expertise. In this paper, we present MigrationAdvisor, an evidence-based tool to recommend library migration targets through intelligent analysis upon a large number of GitHub repositories and Java libraries. The migration advisories are provided through a search engine style web service where developers can seek migration suggestions for a specific library. We conduct systematic evaluations on the correctness of results, and evaluate the usefulness of the tool by collecting usage feedback from industry developers. Video:

Recommended citation: Hao He, Yulin Xu, Xiao Cheng, Guangtai Liang and Minghui Zhou. MigrationAdvisor: Recommending LibraryMigrations from Large-Scale Open-Source Data. Accepted by ICSE 2021 Demonstration Track. Acceptance Rate: 37.1% (23/62). PDF.

A Large-Scale Empirical Study on Java Library Migrations: Prevalence, Trends, and Rationales

Published in Under Review, 2021

With the rise of open-source software and package hosting platforms, reusing 3rd-party libraries has become a common practice. Due to risks including security vulnerabilities, lack of maintenance, unexpected failures, and license issues, a project may completely remove a used library and replace it with another library, which we call library migration. Despite substantial research on dependency management, the understanding of how and why library migrations occur is still lacking. Achieving this understanding may help practitioners optimize their library selection criteria, develop automated approaches to monitor dependencies, and provide migration suggestions for their libraries or software projects. In this paper, through a fine-grained commit-level analysis of 19,652 Java GitHub projects, we extract the largest migration dataset to-date (1,194 migration rules, 3,163 migration commits). We show that 8,065 projects having at least one library removal and 1,564 (lower-bound) to 5,004 (upper-bound) projects have at least one migration, indicating the prevalence of library migrations. We find that projects with library removals have one removal per 139 commits, and projects with migrations have 2 to 4 migrations in median. We discover that library migrations are dominated by several domains presenting a long tail distribution. Also, migrations are highly unidirectional in that libraries are either mostly abandoned or mostly chosen in our project corpus. A thematic analysis on related commit messages, issues, and pull requests identifies 14 frequently mentioned migration reasons, 7 of which are not discussed in previous work. Our findings can be operationalized into actionable insights for package hosting platforms, project maintainers, and library developers.

Recommended citation: Hao He, Runzhi He, Haiqiao Gu, and Minghui Zhou. A Large-Scale Empirical Study on Java Library Migrations: Prevalence, Trends, and Rationales. Under Review. PDF.



Introduction to Computer Systems, Teaching Assistant, Fall 2018

Undergraduate course, Peking University, School of Electronic Engineering and Computer Science, 2018

Introducton to Computer Systems is an undergraduate course at Peking University. This course originates from the famous CMU 15-213 course. It includes a wide range of selected topics from system programming, computer organization, operating systems and networks. Up to 400 perspective students in computer science will take this course each year.

Introduction to Computation (C), Teaching Assistant, Fall 2020

Undergraduate course, Peking University, School of Electronic Engineering and Computer Science, 2020

Introducton to Computation (C) is an undergraduate course at Peking University. It is an introductory course to programming for students majoring in literal arts (literature, foreign language, history, etc).