Malware family discovery using reversible jump MCMC sampling of regimes
File(s)
Author(s)
Bolton, Alexander
Heard, Nicholas
Type
Journal Article
Abstract
Malware is computer software which has either been designed or modified with malicious intent. Hundreds of thousands of new malware threats appear on the internet each day. This is made possible through reuse of known exploits in computer systems which have not been fully eradicated; existing pieces of malware can be trivially modified and combined to create new malware which is unknown to anti-virus programs. Finding new software with similarities to known malware is therefore an important goal in cyber-security. A dynamic instruction trace of a piece of software is the sequence of machine language instructions it generates when executed. Statistical analysis of a dynamic instruction trace can help reverse engineers infer the purpose and origin of the software that generated it. Instruction traces have been successfully modeled as simple Markov chains, but empirically there are change points in the structure of the traces, with recurring regimes of transition patterns. Here, reversible jump MCMC for change point detection is extended to incorporate regime-switching, allowing regimes to be inferred from malware instruction traces. A similarity measure for malware programs based on regime matching is then used to infer the originating families, leading to compelling performance results.
Date Issued
2018-07-11
Date Acceptance
2017-12-30
Citation
Journal of the American Statistical Association, 2018, 113 (524), pp.1490-1502
ISSN
0162-1459
Publisher
Taylor & Francis
Start Page
1490
End Page
1502
Journal / Book Title
Journal of the American Statistical Association
Volume
113
Issue
524
Copyright Statement
© 2018 Taylor & Francis. This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of the American Statistical Association on 11 Jul 2018, available online: https://doi.org/10.1080/01621459.2018.1423984
Subjects
Science & Technology
Physical Sciences
Statistics & Probability
Mathematics
Change point analysis
Dynamic instruction trace
Regime-switching
Reversible jump Markov chain Monte Carlo
CHAIN MONTE-CARLO
MARKOV-CHAIN
MODEL
0104 Statistics
1403 Econometrics
1603 Demography
Statistics & Probability
Publication Status
Published
Date Publish Online
2018-01-19