- Statist. Sci.
- Volume 15, Number 2 (2000), 111-131.
Data Mining for Fun and Profit
Data mining is defined as the process of seeking interesting or valuable information within large data sets. This presents novel challenges and problems, distinct from those typically arising in the allied areas of statistics, machine learning, pattern recognition or database science. A distinction is drawn between the two data mining activities of model building and pattern detection. Even though statisticians are familiar with the former, the large data sets involved in data mining mean that novel problems do arise. The second of the activities, pattern detection, presents entirely new classes of challenges, some arising, again, as a consequence of the large sizes of the data sets. Data quality is a particularly troublesome issue in data mining applications, and this is examined. The discussion is illustrated with a variety of real examples.
Statist. Sci., Volume 15, Number 2 (2000), 111-131.
First available in Project Euclid: 24 December 2001
Permanent link to this document
Digital Object Identifier
Hand, David J.; Blunt, Gordon; Kelly, Mark G.; Adams, Niall M. Data Mining for Fun and Profit. Statist. Sci. 15 (2000), no. 2, 111--131. doi:10.1214/ss/1009212753. https://projecteuclid.org/euclid.ss/1009212753