Compiler fuzzing: how much does it matter?
File(s)3360581 (1).pdf (443.42 KB)
Published version
Author(s)
Marcozzi, Michael
Tang, Qiyi
Donaldson, Alastair
Cadar, Cristian
Type
Journal Article
Abstract
Despite much recent interest in randomised testing (fuzzing) of compilers, the practical impact of fuzzer-found
compiler bugs on real-world applications has barely been assessed. We present the first quantitative and
qualitative study of the tangible impact of miscompilation bugs in a mature compiler. We follow a rigorous
methodology where the bug impact over the compiled application is evaluated based on (1) whether the bug
appears to trigger during compilation; (2) the extent to which generated assembly code changes syntactically
due to triggering of the bug; and (3) whether such changes cause regression test suite failures, or whether
we can manually find application inputs that trigger execution divergence due to such changes. The study
is conducted with respect to the compilation of more than 10 million lines of C/C++ code from 309 Debian
packages, using 12% of the historical and now fixed miscompilation bugs found by four state-of-the-art fuzzers
in the Clang/LLVM compiler, as well as 18 bugs found by human users compiling real code or as a by-product
of formal verification efforts. The results show that almost half of the fuzzer-found bugs propagate to the
generated binaries for at least one package, in which case only a very small part of the binary is typically
affected, yet causing two failures when running the test suites of all the impacted packages. User-reported
and formal verification bugs do not exhibit a higher impact, with a lower rate of triggered bugs and one test
failure. The manual analysis of a selection of the syntactic changes caused by some of our bugs (fuzzer-found
and non fuzzer-found) in package assembly code, shows that either these changes have no semantic impact or
that they would require very specific runtime circumstances to trigger execution divergence.
compiler bugs on real-world applications has barely been assessed. We present the first quantitative and
qualitative study of the tangible impact of miscompilation bugs in a mature compiler. We follow a rigorous
methodology where the bug impact over the compiled application is evaluated based on (1) whether the bug
appears to trigger during compilation; (2) the extent to which generated assembly code changes syntactically
due to triggering of the bug; and (3) whether such changes cause regression test suite failures, or whether
we can manually find application inputs that trigger execution divergence due to such changes. The study
is conducted with respect to the compilation of more than 10 million lines of C/C++ code from 309 Debian
packages, using 12% of the historical and now fixed miscompilation bugs found by four state-of-the-art fuzzers
in the Clang/LLVM compiler, as well as 18 bugs found by human users compiling real code or as a by-product
of formal verification efforts. The results show that almost half of the fuzzer-found bugs propagate to the
generated binaries for at least one package, in which case only a very small part of the binary is typically
affected, yet causing two failures when running the test suites of all the impacted packages. User-reported
and formal verification bugs do not exhibit a higher impact, with a lower rate of triggered bugs and one test
failure. The manual analysis of a selection of the syntactic changes caused by some of our bugs (fuzzer-found
and non fuzzer-found) in package assembly code, shows that either these changes have no semantic impact or
that they would require very specific runtime circumstances to trigger execution divergence.
Date Issued
2019-10
Date Acceptance
2019-08-31
Citation
Proceedings of the ACM on Programming Languages, 2019, 3, pp.155:1-155:29
ISSN
2475-1421
Publisher
Association for Computing Machinery (ACM)
Start Page
155:1
End Page
155:29
Journal / Book Title
Proceedings of the ACM on Programming Languages
Volume
3
Copyright Statement
© 2019 Copyright held by the owner/author(s). The work is licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Sponsor
Engineering & Physical Science Research Council (EPSRC)
Identifier
https://dl.acm.org/doi/abs/10.1145/3360581
Grant Number
EP/R011605/1
Publication Status
Published
Article Number
Article 155
Date Publish Online
2019-10-01