Data Mining for Fun and Profit

Niall M. Adams; Gordon Blunt; David J. Hand; Mark G. Kelly

doi:10.1214/ss/1009212753

May 2000 Data Mining for Fun and Profit

Niall M. Adams, Gordon Blunt, David J. Hand, Mark G. Kelly

Statist. Sci. 15(2): 111-131 (May 2000). DOI: 10.1214/ss/1009212753

Abstract

Data mining is defined as the process of seeking interesting or valuable information within large data sets. This presents novel challenges and problems, distinct from those typically arising in the allied areas of statistics, machine learning, pattern recognition or database science. A distinction is drawn between the two data mining activities of model building and pattern detection. Even though statisticians are familiar with the former, the large data sets involved in data mining mean that novel problems do arise. The second of the activities, pattern detection, presents entirely new classes of challenges, some arising, again, as a consequence of the large sizes of the data sets. Data quality is a particularly troublesome issue in data mining applications, and this is examined. The discussion is illustrated with a variety of real examples.

Citation

Download Citation

Niall M. Adams. Gordon Blunt. David J. Hand. Mark G. Kelly. "Data Mining for Fun and Profit." Statist. Sci. 15 (2) 111 - 131, May 2000. https://doi.org/10.1214/ss/1009212753