Research Datasets
Dataset Description: A dataset of digits in identifier names
Associated Publication: Understanding Digits in Identifier Names: An Exploratory Study
Repository: https://doi.org/10.5281/zenodo.6308873
Dataset Description: Training set for the SCANL ensemble part-of-speech tagger
Associated Publication: An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags
Repository: https://github.com/SCANL/datasets/tree/master/ensemble_tagger_training_data
Dataset Description: A dataset of refactoring discussions on Stack Overflow
Associated Publication: How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics in Stack Overflow
Repository: https://doi.org/10.5281/zenodo.5361068
Dataset Description: A manually annoated dataset of part-of-speech tags in test method names
Associated Publication: Using Grammar Patterns to Interpret Test Method Name Evolution
Repository: https://doi.org/10.5281/zenodo.4608143
Dataset Description: A dataset of “simple stupid bugs” (SStuBs) in test and non-test (i.e., production) files in popular open-source Java Maven projects
Associated Publication: On the Distribution of “Simple Stupid Bugs” in Unit Test Files: An Exploratory Study
Repository: https://doi.org/10.5281/zenodo.4608719
Dataset Description: A dataset of 1,335 manually POS-tagged identifier names
Associated Publication: On the Generation, Structure, and Semantics of Grammar Patterns in Source Code Identifiers
Repository: https://github.com/SCANL/datasets/tree/master/grammar_patterns_data
Dataset Description: A dataset of rename refactorings and their related context
Associated Publication: Contextualizing rename decisions using refactorings, commit messages, and data types
Repository: https://drive.google.com/drive/folders/1imaoD_vzJccWKHVBDmT-aTdSolaHensn
Dataset Description: A dataset of 861 abbreviations and their corresponding expansions
Associated Publication: An Open Dataset of Abbreviations and Expansions
Repository: https://github.com/SCANL/datasets/tree/master/abbreviation_expansions_data