Context: Feature Location (FL) aims to locate observable functionalities in the source code. Considering its key role in software maintenance, a vast array of automated/semi-automated Feature Location Techniques (FLTs) have been proposed by researchers. In order to compare FLTs, across the plethora of existing techniques, an open, standard set of non-subjective, reproducible ‘compare-to’ FLT techniques should be used for evaluation and these compare-to techniques should be evaluated against each other. In addition, evaluation in FLTs is highly confounded by different FL goals and evaluation criteria further hindering comparability.

Objective: This paper moves towards standardising FLT comparability by assessing eight baseline techniques in an empirical design that addresses and characterizes these confounding factors.

Method: These baseline techniques are assessed in four extensive case studies to rank their performance, whilst employing best empirical practice. This assessment is performed in a more reproducible empirical design where the confounding factors are characterized in terms of FL goals. 

Results: Results of the case studies suggest that different baseline technique perform differently in each empirical design (as characterized by different FL goals). In identifying the best implementation for VSM, LSI and LDA baseline technique, VSM-Lucene and LSI-Matlab are found to have performed generally better than the other implementations. Whereas the results for LDA varied each case study suggest that these findings are caveat to the systems characteristics. Relative performance of baseline techniques is also presented for each FL goal. For example, for a goal of locating the foothold of a feature, the following ranking was found between baseline techniques: VSM_Lucene > LSI_Matlab > VSM_Tracelab > VSM_Matlab > LDA_Gibbs > LDA_Gensim > LDA_R > LSI_Gensim.  Overall, VSM-Lucene is found to be the best performing FLT for each FL goal in each case study and when assessed on a combined dataset of all four case studies.  

Conclusion: The paper finds that VSM_Lucene is the best-performing FLT across the FL goals and case studies presented. In addition, by presenting the relative performance of baseline techniques this paper allows a cross-comparison of existing FLTs. That is, it will allow a technique compared against one of the baselines, under our empirical conditions, to be compared to other FLTs, assessed against other baseline technique(s). Finally, it will also facilitate comparison of those FLTs, not compared with any baseline technique but with a shared a dataset in their evaluation, by comparing a baselines with that same dataset. Researchers may also adopt our systematic evaluation procedure that is cognisant of FL goals and evaluation criteria, as a step towards standardising baseline evaluation in the field, thus facilitating comparison across FLTs.

Replication Package of Case Studies

Datasets






Package iconEclipse.zip


Package iconmuCommander.zip


Package iconCommonsMath.zip


Package iconCommonsLang.zip


Package iconArgoUML0.22.zip


Package iconDerby.zip


Package iconJabRef2.6.zip


Package iconjEdit4.3.zip

Results

Package iconResults.zip

Implementations of Baseline

 

Techniques

 

 

Statistics