A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects

Markus Borg, Per Runeson, Jens Johansson, Mika Mäntylä

Research output: Chapter in Book/Report/Conference proceedingPaper in conference proceedingpeer-review

239 Downloads (Pure)

Abstract

Context: Duplicate detection is a fundamental part of issue management. Systems able to predict whether a new defect report will be closed as a duplicate, may decrease costs by limiting rework and collecting related pieces of information. Goal: Our work explores using Apache Lucene for large-scale duplicate detection based on textual content. Also, we evaluate the previous claim that results are improved if the title is weighted as more important than the description. Method: We conduct a conceptual replication of a well-cited study conducted at Sony Ericsson, using Lucene for searching in the public Android defect repository. In line with the original study, we explore how varying the weighting of the title and the description affects the accuracy. Results: We show that Lucene obtains the best results when the defect report title is weighted three times higher than the description, a bigger difference than has been previously acknowledged. Conclusions: Our work shows the potential of using Lucene as a scalable solution for duplicate detection.
Original languageEnglish
Title of host publication[Host publication title missing]
Number of pages4
DOIs
Publication statusPublished - 2014
Event8th International Symposium on Empirical Software Engineering and Measurement - Turin, Italy
Duration: 2014 Sept 18 → …

Conference

Conference8th International Symposium on Empirical Software Engineering and Measurement
Country/TerritoryItaly
CityTurin
Period2014/09/18 → …

Subject classification (UKÄ)

  • Computer Science

Free keywords

  • software evolution
  • issue management
  • information retrieval
  • replication

Fingerprint

Dive into the research topics of 'A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects'. Together they form a unique fingerprint.

Cite this