Language | Mined | Analyzed |
---|---|---|
Loading... |
For each repository mined from GitHub, we also calculate its line metrics: the amount of code, comment and blank lines for each programming language. Doing this requires running a background analysis job. Hover over each language bar to display individual language coverage.
In 2021, our work was published in the proceedings of the 18th IEEE/ACM International Conference on Mining Software Repositories. The paper itself is accessible through arXiv. Since its publication, it has accrued over 100 citations. You can see which works have cited ours on Google Scholar. If you use our tool or data in your research, please consider citing our work:
@inproceedings{Dabic:msr2021data,
author = {Ozren Dabic and Emad Aghajani and Gabriele Bavota},
title = {Sampling Projects in GitHub for {MSR} Studies},
booktitle = {18th {IEEE/ACM} International Conference on Mining Software Repositories,
{MSR} 2021},
pages = {560--564},
publisher = {{IEEE}},
year = {2021}
}
The SEART GitHub Search Engine (seart-ghs) allows researchers to sample repositories to use for empirical studies by using several combinations of selection criteria. All parameters in the search form are optional. Searching without any specified parameters would result in retrieving all currently available repositories in our database (this search may take up to a few minutes).
As a design choice, we only mine information about repositories having at least 10 stars. This drastically reduces the number of repositories we store and makes the tool more scalable (e.g., from preliminary analyses we performed on Java, < 5% of repositories have at least 10 stars). While we acknowledge that the number of stars is not a good proxy of repositories quality or relevance, we found the 10 stars threshold to be a reasonable mechanism to remove most personal/toy projects unlikely to be relevant for empirical studies.
Note that, since seart-ghs continuously updates the information about the mined projects, a project excluded in the first place as having < 10 stars can be mined a few days after once its number of stars reaches the set threshold.
To report a bug & see known bugs, head on over to the project issue tracker.
Powered By:
Your download is being prepared. Please be patient, as it may take some time for our backend to generate your result file. In the meantime, why not give our project a star on GitHub?