Open Access
March 2014 Stochastic identification of malware with dynamic traces
Curtis Storlie, Blake Anderson, Scott Vander Wiel, Daniel Quist, Curtis Hash, Nathan Brown
Ann. Appl. Stat. 8(1): 1-18 (March 2014). DOI: 10.1214/13-AOAS703


A novel approach to malware classification is introduced based on analysis of instruction traces that are collected dynamically from the program in question. The method has been implemented online in a sandbox environment (i.e., a security mechanism for separating running programs) at Los Alamos National Laboratory, and is intended for eventual host-based use, provided the issue of sampling the instructions executed by a given process without disruption to the user can be satisfactorily addressed. The procedure represents an instruction trace with a Markov chain structure in which the transition matrix, $\mathbf{P} $, has rows modeled as Dirichlet vectors. The malware class (malicious or benign) is modeled using a flexible spline logistic regression model with variable selection on the elements of $\mathbf{P} $, which are observed with error. The utility of the method is illustrated on a sample of traces from malware and nonmalware programs, and the results are compared to other leading detection schemes (both signature and classification based). This article also has supplementary materials available online.


Download Citation

Curtis Storlie. Blake Anderson. Scott Vander Wiel. Daniel Quist. Curtis Hash. Nathan Brown. "Stochastic identification of malware with dynamic traces." Ann. Appl. Stat. 8 (1) 1 - 18, March 2014.


Published: March 2014
First available in Project Euclid: 8 April 2014

zbMATH: 06302225
MathSciNet: MR3191980
Digital Object Identifier: 10.1214/13-AOAS703

Keywords: Adaptive LASSO , ‎classification‎ , Elastic net , Empirical Bayes , logistic regression , Malware detection , Relaxed Lasso , splines

Rights: Copyright © 2014 Institute of Mathematical Statistics

Vol.8 • No. 1 • March 2014
Back to Top