A Change-Type Based Empirical Study on the Stability of Cloned Code

Published:

Md Saidur Rahman, Chanchal K. Roy, "A Change-Type Based Empirical Study on the Stability of Cloned Code", IEEE 14th International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 31-40, 2014, Victoria, BC, Canada. Preprint Published


Abstract

Clones are the duplicate or similar code blocks in software systems. A large number of studies concerning the impacts of clones on software systems mainly focus on the frequency of changes to evaluate stability, consistency in evolution and introduction of bugs. Although it is obvious that not each type of changes has equal impact on software systems, none of the existing studies take the types of changes and their significance into account during comparative evaluation of stability of cloned and non-cloned code. This paper presents an empirical study on the comparative stability of cloned and non-cloned code from the perspective of different change types. Changes from successive revisions are extracted and classified using Change Distiller which employs Abstract Syntax Tree (AST) differencing of the successive revisions of source code and assigns the corresponding level of significance to each of the classified changes. We detect exact (Type-1) and near-miss (Type-2 and Type-3) clones using the hybrid clone detection tool NiCad. Extracted and classified changes and clone information are then analyzed to compare the stability of cloned and non-cloned code from three different perspectives: types of clones, types of changes with respect to the significance of changes, and size and extent of evolution of the systems. Our study on seven open-source Java systems with diversity in their size, length of evolution and application domain shows that changes are more frequent in cloned code than in noncloned code and Type-1 clones are comparatively more vulnerable to the stability of the systems. Therefore, cloned code is less stable than non-cloned code suggesting that cloned code is likely to pose more maintenance challenges than non-cloned code.