Abstract
Isolation forest is a popular method for anomaly detection introduced in Liu, Ting and Zhou (2008, 2012). Nonetheless, its statistical properties are little understood. We study the scoring function that is induced by the isolation forest over a finite sample at the limit when the number of trees tends to infinity, based on an analytical expression that we derive. The isolation forest method is proved to be effective at detecting geometrically isolated points within a finite sample. We then study the large sample limit of the scoring function in random designs as well as in sequences of regular designs and we find that the isolation forest method performs as a detector of the support of the underlying distribution. We also find that dense clustered anomalies are not detected asymptotically by the isolation forest method, a phenomenon known as the masking effect, but that isolation forest anomaly detection is robust to training with normal data sparsely contaminated by anomalies. Numerical examples are provided that confirm the theoretical results.
Acknowledgments
We thank three anonymous referees and the Associate Editor for insightful comments and suggestions.
Citation
Bruno Pelletier. "On the statistical properties of the isolation forest anomaly detection method." Electron. J. Statist. 18 (2) 4322 - 4381, 2024. https://doi.org/10.1214/24-EJS2305
Information