Scalable detection of semantic clones 论文

2008引用 331
Software Engineering ResearchSoftware Testing and Debugging TechniquesSoftware System Performance and Reliability

摘要

Several techniques have been developed for identifying similar code fragments in programs. These similar fragments, referred to as code clones, can be used to identify redundant code, locate bugs, or gain insight into program design. Existing scalable approaches to clone detection are limited to finding program fragments that are similar only in their contiguous syntax. Other, semantics-based approaches are more resilient to differences in syntax, such as re-ordered statements, related statements interleaved with other un-related statements, or the use of semantically equivalent control structures. However, none of these techniques have scaled to real world code bases. These approaches capture semantic informa-tion from Program Dependence Graphs (PDGs), program represen-tations that encode data and control dependencies between state-ments and predicates. Our definition of a code clone is also based on this representation: we consider program fragments with iso-morphic PDGs to be clones. In this paper, we present the first scalable clone detection algo-rithm based on this definition of semantic clones. Our insight is the reduction of the difficult graph similarity problem to a simpler tree similarity problem by mapping carefully selected PDG subgraphs to their related structured syntax. We efficiently solve the tree sim-ilarity problem to create a scalable analysis. We have implemented this algorithm in a practical tool and performed evaluations on sev-eral million-line open source projects, including the Linux kernel. Compared with previous approaches, our tool locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments.

相关技术

暂无数据

相关事件

暂无数据

相关文章

暂无数据