Software Traceability using Latent Semantic Analysis and Relevance Feedback
Kritzinger, PS and H Krüger (2008) Software Traceability using Latent Semantic Analysis and Relevance Feedback. Technical Report CS08-01-00, Department of Computer Science, University of Cape Town.
Software traceability (ST), in its broadest sense, is the process of tracking changes in the document corpus which are created throughout the software development life-cycle. However, traditional ST approaches require a lot of human effort to identify and consistently record inter-dependencies among software artifacts. In this paper we present an approach that reveals traceability links automatically using the information retrieval (IR) techniques of Latent Semantic Analysis (LSA) and Relevance Feedback and present a software tool to implement these ideas. We discuss in detail how software artifacts can be represented in a vector space model and how term extraction and weighting can be accomplished for UML artifacts, such as use-cases, interaction and state diagrams, as well as for source code and natural language text documents. We also explain how structural information which is always inherent in software artifacts can be preserved in the term extraction and weighting phase of creating traceable artifacts. Unlike other tools, we incorporate human knowledge through relevance feedback into the traceability link recovery process with the aim to improve the quality of traceability links. Finally, we illustrate the effectiveness of our tool-based approach and our proposals through a case study with a pilot software project and compare our results with those of a manual tracing process.
|EPrint Type:||Departmental Technical Report|
|Keywords:||Requirements Engineering, Software Traceability, Software Development, Change Management, Latent Semantic Indexing, Relevance feedback, Information Retrieval, UML|
|Subjects:||D Software: D.2 SOFTWARE ENGINEERING|
|Deposited By:||Pileggi, PP|
|Deposited On:||27 October 2008|