Bayesian Analysis

Exact Bayesian regression of piecewise constant functions

Marcus Hutter

Full-text: Open access

Abstract

We derive an exact and efficient Bayesian regression algorithm for piecewise constant functions of unknown segment number, boundary locations, and levels. The derivation works for any noise and segment level prior, e.g. Cauchy which can handle outliers. We derive simple but good estimates for the in-segment variance. We also propose a Bayesian regression curve as a better way of smoothing data without blurring boundaries. The Bayesian approach also allows straightforward determination of the evidence, break probabilities and error estimates, useful for model selection and significance and robustness studies. We discuss the performance on synthetic and real-world examples. Many possible extensions are discussed.

Article information

Source
Bayesian Anal. Volume 2, Number 4 (2007), 635-664.

Dates
First available in Project Euclid: 22 June 2012

Permanent link to this document
http://projecteuclid.org/euclid.ba/1340370708

Digital Object Identifier
doi:10.1214/07-BA225

Mathematical Reviews number (MathSciNet)
MR2361968

Zentralblatt MATH identifier
1331.62144

Keywords
Bayesian regression exact polynomial algorithm non-parametric inference piecewise constant function dynamic programming change point problem

Citation

Hutter, Marcus. Exact Bayesian regression of piecewise constant functions. Bayesian Anal. 2 (2007), no. 4, 635--664. doi:10.1214/07-BA225. http://projecteuclid.org/euclid.ba/1340370708.


Export citation

References

  • D. Barry and J. A. Hartigan. Product partition models for change point problems. Annals of Statistics, 20:260–279, 1992.
  • D. Barry and J. A. Hartigan. A Bayesian analysis for change point problems. Journal of the American Statistical Association, 88:309–319, 1993.
  • W. M. Bolstad. Introduction to Bayesian Statistics. Wiley Interscience, New Jersey, 2004.
  • D. Endres and P. Földiák. Bayesian bin distribution inference and mutual information. IEEE Transactions on Information Theory, 51(11):3766–3779, 2005.
  • P. Fearnhead. Exact Bayesian curve fitting and signal segmentation. IEEE Transactions on Signal Processing, 53:2160–2166, 2005.
  • P. Fearnhead. Exact and efficient Bayesian inference for multiple changepoint problems. Statistics and Computing, 16:203–213, 2006.
  • I. J. Good. Explicativity, corroboration, and the relative odds of hypotheses. In Good thinking: The Foundations of Probability and its applications. University of Minnesota Press, Minneapolis, MN, 1983.
  • M. Hutter. Additional material to article. http://www.idsia.ch/[1]\raisebox-1ex marcus[1]/ai /pcreg.htm, 2005.
  • M. Hutter. Fast non-parametric Bayesian inference on infinite trees. In Proc. 10th International Conf. on Artificial Intelligence and Statistics (AISTATS-2005), pages 144–151. Society for Artificial Intelligence and Statistics, 2005.
  • M. Hutter. Bayesian regression of piecewise constant functions. In J.M. Bernardo et al., editors, Proc. Bayesian Statistics, volume 8, Benidorm, 2007. Oxford University Press.
  • E. T. Jaynes. Probability Theory: The Logic of Science. Cambridge University Press, Cambridge, MA, 2003.
  • K. Jong et al. Chromosomal breakpoint detection in human cancer. In Applications of Evolutionary Computing: EvoWorkshops'03, volume 2611 of LNCS, pages 54–65. Springer, 2003.
  • R. E. Kass and L. Wasserman. A reference Bayesian test for nested hypotheses with large samples. Journal of the ACM, 90:773–795, 1995.
  • D. J. C. MacKay. Information theory, inference and learning algorithms. Cambridge University Press, Cambridge, MA, 2003.
  • A. B. Olshen, E. S. Venkatraman, R. Lucito, and M. Wigler. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5:557–572, 2004.
  • F. Picard et al. A statistical approach for array CGH data analysis. BMC Bioinformatics, 6(27):1–14, 2005.
  • D. Pinkel et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nature Genetics, 20:207–211, 1998.
  • F. Picard, S. Robin, E. Lebarbier, and J. J. Daudin. A segmentation-clustering problem for the analysis of array CGH data. In Proc. 11th International Symposium on Applied Stochastic Models and Data Analysis (ASMDA'05), pages 145–152, Brest, France, 2005.
  • A. Rinaldi et al. Genomic and expression profiling identifies the B-cell associated tyrosine kinase Syk as a possible therapeutic target in mantle cell lymphoma. British Journal of Haematology, 132(3):303–316, 2006.
  • G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6(2):461–464, 1978.
  • A. Sen and M. S. Srivastava. On tests for detecting a change in mean. Annals of Statistics, 3:98–108, 1975.
  • D. L. Weakliem. A critique of the Bayesian information criterion for model selection. Sociological Methods and Research, 27:359–397, 1999.
  • Y.-C. Yao. Estimation of a noisy discrete-time step function: Bayes and empirical Bayes approaches. Annals of Statistics, 12:1434–1447, 1984.