Real-time detection of dictionary DGA network traffic using deep
learning
learning
File(s)2003.12805v1.pdf (1.33 MB)
Supporting information
Author(s)
Highnam, Kate
Puzio, Domenic
Luo, Song
Jennings, Nicholas R
Type
Working Paper
Abstract
Botnets and malware continue to avoid detection by static rules engines when
using domain generation algorithms (DGAs) for callouts to unique, dynamically
generated web addresses. Common DGA detection techniques fail to reliably
detect DGA variants that combine random dictionary words to create domain names
that closely mirror legitimate domains. To combat this, we created a novel
hybrid neural network, Bilbo the `bagging` model, that analyses domains and
scores the likelihood they are generated by such algorithms and therefore are
potentially malicious. Bilbo is the first parallel usage of a convolutional
neural network (CNN) and a long short-term memory (LSTM) network for DGA
detection. Our unique architecture is found to be the most consistent in
performance in terms of AUC, F1 score, and accuracy when generalising across
different dictionary DGA classification tasks compared to current
state-of-the-art deep learning architectures. We validate using
reverse-engineered dictionary DGA domains and detail our real-time
implementation strategy for scoring real-world network logs within a large
financial enterprise. In four hours of actual network traffic, the model
discovered at least five potential command-and-control networks that commercial
vendor tools did not flag.
using domain generation algorithms (DGAs) for callouts to unique, dynamically
generated web addresses. Common DGA detection techniques fail to reliably
detect DGA variants that combine random dictionary words to create domain names
that closely mirror legitimate domains. To combat this, we created a novel
hybrid neural network, Bilbo the `bagging` model, that analyses domains and
scores the likelihood they are generated by such algorithms and therefore are
potentially malicious. Bilbo is the first parallel usage of a convolutional
neural network (CNN) and a long short-term memory (LSTM) network for DGA
detection. Our unique architecture is found to be the most consistent in
performance in terms of AUC, F1 score, and accuracy when generalising across
different dictionary DGA classification tasks compared to current
state-of-the-art deep learning architectures. We validate using
reverse-engineered dictionary DGA domains and detail our real-time
implementation strategy for scoring real-world network logs within a large
financial enterprise. In four hours of actual network traffic, the model
discovered at least five potential command-and-control networks that commercial
vendor tools did not flag.
Date Issued
2020-03-28
Citation
2020
Publisher
arXiv
Identifier
http://arxiv.org/abs/2003.12805v1
Subjects
cs.CR
cs.CR
cs.LG
Notes
12 pages, 6 figures, PrePrint, code on Github (https://github.com/jinxmirror13/bilbo-bagging-hybrid)
Publication Status
Published