Open Access
September 2008 Sequential category aggregation and partitioning approaches for multi-way contingency tables based on survey and census data
L. Fraser Jackson, Alistair G. Gray, Stephen E. Fienberg
Ann. Appl. Stat. 2(3): 955-981 (September 2008). DOI: 10.1214/08-AOAS175

Abstract

Large contingency tables arise in many contexts but especially in the collection of survey and census data by government statistical agencies. Because the vast majority of the variables in this context have a large number of categories, agencies and users need a systematic way of constructing tables which are summaries of such contingency tables. We propose such an approach in this paper by finding members of a class of restricted log-linear models which maximize the likelihood of the data and use this to find a parsimonious means of representing the table. In contrast with more standard approaches for model search in hierarchical log-linear models (HLLM), our procedure systematically reduces the number of categories of the variables. Through a series of examples, we illustrate the extent to which it can preserve the interaction structure found with HLLMs and be used as a data simplification procedure prior to HLL modeling. A feature of the procedure is that it can easily be applied to many tables with millions of cells, providing a new way of summarizing large data sets in many disciplines. The focus is on information and description rather than statistical testing. The procedure may treat each variable in the table in different ways, preserving full detail, treating it as fully nominal, or preserving ordinality.

Citation

Download Citation

L. Fraser Jackson. Alistair G. Gray. Stephen E. Fienberg. "Sequential category aggregation and partitioning approaches for multi-way contingency tables based on survey and census data." Ann. Appl. Stat. 2 (3) 955 - 981, September 2008. https://doi.org/10.1214/08-AOAS175

Information

Published: September 2008
First available in Project Euclid: 13 October 2008

zbMATH: 1149.62049
MathSciNet: MR2516799
Digital Object Identifier: 10.1214/08-AOAS175

Keywords: Collapsibility , Kullback–Leibler distance , level merging , log-linear modeling , partitioning information , reducing dimensionality

Rights: Copyright © 2008 Institute of Mathematical Statistics

Vol.2 • No. 3 • September 2008
Back to Top