Open Access
2024 On the statistical properties of the isolation forest anomaly detection method
Bruno Pelletier
Author Affiliations +
Electron. J. Statist. 18(2): 4322-4381 (2024). DOI: 10.1214/24-EJS2305

Abstract

Isolation forest is a popular method for anomaly detection introduced in Liu, Ting and Zhou (2008, 2012). Nonetheless, its statistical properties are little understood. We study the scoring function that is induced by the isolation forest over a finite sample at the limit when the number of trees tends to infinity, based on an analytical expression that we derive. The isolation forest method is proved to be effective at detecting geometrically isolated points within a finite sample. We then study the large sample limit of the scoring function in random designs as well as in sequences of regular designs and we find that the isolation forest method performs as a detector of the support of the underlying distribution. We also find that dense clustered anomalies are not detected asymptotically by the isolation forest method, a phenomenon known as the masking effect, but that isolation forest anomaly detection is robust to training with normal data sparsely contaminated by anomalies. Numerical examples are provided that confirm the theoretical results.

Acknowledgments

We thank three anonymous referees and the Associate Editor for insightful comments and suggestions.

Citation

Download Citation

Bruno Pelletier. "On the statistical properties of the isolation forest anomaly detection method." Electron. J. Statist. 18 (2) 4322 - 4381, 2024. https://doi.org/10.1214/24-EJS2305

Information

Received: 1 February 2024; Published: 2024
First available in Project Euclid: 14 November 2024

Digital Object Identifier: 10.1214/24-EJS2305

Subjects:
Primary: 62G20
Secondary: 62H30

Keywords: anomaly detection , Binary tree , isolation forest , recursive partition , scoring

Vol.18 • No. 2 • 2024
Back to Top