Open Access
May 2000 Data Mining for Fun and Profit
Niall M. Adams, Gordon Blunt, David J. Hand, Mark G. Kelly
Statist. Sci. 15(2): 111-131 (May 2000). DOI: 10.1214/ss/1009212753

Abstract

Data mining is defined as the process of seeking interesting or valuable information within large data sets. This presents novel challenges and problems, distinct from those typically arising in the allied areas of statistics, machine learning, pattern recognition or database science. A distinction is drawn between the two data mining activities of model building and pattern detection. Even though statisticians are familiar with the former, the large data sets involved in data mining mean that novel problems do arise. The second of the activities, pattern detection, presents entirely new classes of challenges, some arising, again, as a consequence of the large sizes of the data sets. Data quality is a particularly troublesome issue in data mining applications, and this is examined. The discussion is illustrated with a variety of real examples.

Citation

Download Citation

Niall M. Adams. Gordon Blunt. David J. Hand. Mark G. Kelly. "Data Mining for Fun and Profit." Statist. Sci. 15 (2) 111 - 131, May 2000. https://doi.org/10.1214/ss/1009212753

Information

Published: May 2000
First available in Project Euclid: 24 December 2001

Digital Object Identifier: 10.1214/ss/1009212753

Keywords: computers , data mining , databases , knowledge discovery , large data sets

Rights: Copyright © 2000 Institute of Mathematical Statistics

Vol.15 • No. 2 • May 2000
Back to Top