
Research Project Title
Pattern Discovery for Defect Detection in Large Software
Principal Investigators
Yuanyuan Zhao
Unit # 35
Project Overview
Many applications, ranging from enterprise systems to embedded ones, require high robustness and reliability. Despite costly efforts to improve software-development methodologies, software bugs in deployed codes continue to thrive, often ccounting for as much as 40% of computer system failures and contributing to more than 60% of security vulnerabilities.
Unfortunately, poor programmer productivity has historically been the dark spot of the IT revolution. It has become increasingly important as software’s complexity increases due to multi-threading, client-server communication, embedded system, feature-rich applications, and high demand for security, privacy, and performance.
While existing software defect detection techniques, including model checking, static analysis and dynamic monitoring, are effective in detecting some types of defects, they are still far from providing a complete solution. Most model checking and static analysis tools usually require programmers to provide specification and/or program annotations, which are usually too tedious to be followed by programmers. Tools such as Purify from Rational that do not require specification or annotations can only detect general defects, i.e. defects that violate general programming rules such as “cannot free an allocated buffer more than once” or “cannot access out of the bound of an array”, but fail to detect bugs, usually those semantic ones, that are more specific to the target software.
Our research takes a fundamentally different and innovative pattern-based defect detection approach that, by applying data mining and statistic techniques into source code and execution profile analysis, will allow us to go beyond syntax information and to automatically extract software-specific semantic information and detect violations. Such semantic information can not only be used as program specification to enhance software productivity but also be used to detect violations and to improve software security, reliability, and quality.
We have conducted some-- research in the proposed direction, in particular, applying data mining in source code analysis to detect copy-paste and related bugs, and implicit programming rules and violations. Our preliminary research have shown promising results: our preliminary tools, CP-Miner and PR-Miner, have detected many new bugs in the latest versions of several large open source software including Linux, Apache, FreeBSD, PostgreSQL, etc with up to millions lines of code.
To validate our idea, we will work to reach the following three research objectives:
Customizing and applying our preliminary pattern-based tools to detect defects in Motorola Software. Since our preliminary tools already work for large open source software, we can easily evaluate them first with Motorola software (source code) to detect software defects. Currently these tools only work with C programs, we need to extend our front-end parsers so that they also work programs written in other programming languages. Additionally,
other software-specific extensions may be necessary to increase the detection accuracy.
Develop new pattern-based techniques to detect other types of defects. We plan to develop more pattern-based techniques such as patch pattern analysis, dynamic execution pattern analysis, etc to detect other types of defects and security volunerabilities in software.
Improve Motorola’s software quality, security and productivity. Our final objective is to provide a set of effectiveautomatic software defect detection tools, which can significantly improve Motorola’s software quality, software security and software development productivity.