Empirical Assessment of Baseline Feature Location Techniques

Context: Feature Location (FL) aims to locate observable functionalities in the source code. Considering its key role in software maintenance, a vast array of automated/semi-automated Feature Location Techniques (FLTs) have been proposed by researchers. In order to compare FLTs, across the plethora of existing techniques, an open, standard set of non-subjective, reproducible ‘compare-to’ FLT techniques should be used for evaluation and these compare-to techniques should be evaluated against each other. In addition, evaluation in FLTs is highly confounded by different FL goals and evaluation criteria further hindering comparability.

Objective: This paper moves towards standardising FLT comparability by assessing eight baseline techniques in an empirical design that addresses and characterizes these confounding factors.

Method: These baseline techniques are assessed in four extensive case studies to rank their performance, whilst employing best empirical practice. This assessment is performed in a more reproducible empirical design where the confounding factors are characterized in terms of FL goals.

Results: Results of the case studies suggest that different baseline technique perform differently in each empirical design (as characterized by different FL goals). In identifying the best implementation for VSM, LSI and LDA baseline technique, VSM-Lucene and LSI-Matlab are found to have performed generally better than the other implementations. Whereas the results for LDA varied each case study suggest that these findings are caveat to the systems characteristics. Relative performance of baseline techniques is also presented for each FL goal. For example, for a goal of locating the foothold of a feature, the following ranking was found between baseline techniques: VSM_Lucene > LSI_Matlab > VSM_Tracelab > VSM_Matlab > LDA_Gibbs > LDA_Gensim > LDA_R > LSI_Gensim. Overall, VSM-Lucene is found to be the best performing FLT for each FL goal in each case study and when assessed on a combined dataset of all four case studies.

Conclusion: The paper finds that VSM_Lucene is the best-performing FLT across the FL goals and case studies presented. In addition, by presenting the relative performance of baseline techniques this paper allows a cross-comparison of existing FLTs. That is, it will allow a technique compared against one of the baselines, under our empirical conditions, to be compared to other FLTs, assessed against other baseline technique(s). Finally, it will also facilitate comparison of those FLTs, not compared with any baseline technique but with a shared a dataset in their evaluation, by comparing a baselines with that same dataset. Researchers may also adopt our systematic evaluation procedure that is cognisant of FL goals and evaluation criteria, as a step towards standardising baseline evaluation in the field, thus facilitating comparison across FLTs.

Replication Package of Case Studies
Datasets	ArgoUML.zip ArgoUML.zip Rhino.zip Rhino.zip iBatis.zip iBatis.zip Mylyn.zip Mylyn.zip Eclipse.zip muCommander.zip CommonsMath.zip CommonsLang.zip ArgoUML0.22.zip Derby.zip JabRef2.6.zip jEdit4.3.zip
Results	Results.zip
Implementations of Baseline Techniques	JGibbLDA-v.1.0.zip JGibbLDA-v.1.0.zip LDA_Gensim.py.txt LDA_Gensim.py.txt LSI_Gensim.py.txt LSI_Gensim.py.txt LSI-Matlab.zip LSI-Matlab.zip R LDA Complete Implementation.txt R LDA Complete Implementation.txt VSM_Luecene.zip VSM_Luecene.zip VSM_Tracelab.zip VSM_Tracelab.zip VSM-Matlab.zip VSM-Matlab.zip
Statistics	Statistics.zip Statistics.zip

Empirical Assessment of Baseline Feature Location Techniques

You are here

ArgoUML.zip

Rhino.zip

iBatis.zip

Mylyn.zip

JGibbLDA-v.1.0.zip

LDA_Gensim.py.txt

LSI_Gensim.py.txt

LSI-Matlab.zip

R LDA Complete Implementation.txt

VSM_Luecene.zip

VSM_Tracelab.zip

VSM-Matlab.zip

Statistics.zip

For further information

Search form

Empirical Assessment of Baseline Feature Location Techniques

You are here

For further information