Abbreviations

An Open Dataset of Abbreviations and Expansions

We present a data set of abbreviations and expansions, derived from a set of five open source systems, for use by the research and development communities.

This study has been accepted for publication at The 35th IEEE International Conference on Software Maintenance and Evolution (ICSME 2019)

Download Preprint
Repository for dataset

An Empirical Study of Abbreviations and Expansions in Software Artifacts

Expanding abbreviations is an important text normalization technique used for the purpose of either increasing developer comprehension or supporting the application of natural-language-based tools for source code identifiers. This paper closely studies abbreviations and where their expansions occur in different software artifacts. Without abbreviation expansion, developers will spend more time in comprehending the code they need to update, and tools analyzing software may obtain weak or non-generalizable results. There are numerous techniques for expanding abbreviations, most of which struggle to reach an average expansion accuracy of 59-62% on general source code identifiers. In this paper, we reveal some characteristics of abbreviations and their expansions through an empirical study of 900 abbreviation-expansion pairs extracted from 5 open source systems in addition to analyzing previous literature. We use these characteristics to identify how current approaches may be complementary and how their results should be reported in the future to help maximize both our understanding of how they compare with other expansion techniques and their reproducibility.

This study has been accepted for publication at The 35th IEEE International Conference on Software Maintenance and Evolution (ICSME 2019)

Artifact evaluated badge

Download Preprint
Presentation Slides