This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of `good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions.
The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches.
This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.
"Bridging the Gap between Different Statistical Approaches: An Integrated Framework for Modelling." Internat. Statist. Rev. 71 (2) 335 - 368, August 2003.