Open Access
2019 Linear regression with sparsely permuted data
Martin Slawski, Emanuel Ben-David
Electron. J. Statist. 13(1): 1-36 (2019). DOI: 10.1214/18-EJS1498

Abstract

In regression analysis of multivariate data, it is tacitly assumed that response and predictor variables in each observed response-predictor pair correspond to the same entity or unit. In this paper, we consider the situation of “permuted data” in which this basic correspondence has been lost. Several recent papers have considered this situation without further assumptions on the underlying permutation. In applications, the latter is often to known to have additional structure that can be leveraged. Specifically, we herein consider the common scenario of “sparsely permuted data” in which only a small fraction of the data is affected by a mismatch between response and predictors. However, an adverse effect already observed for sparsely permuted data is that the least squares estimator as well as other estimators not accounting for such partial mismatch are inconsistent. One approach studied in detail herein is to treat permuted data as outliers which motivates the use of robust regression formulations to estimate the regression parameter. The resulting estimate can subsequently be used to recover the permutation. A notable benefit of the proposed approach is its computational simplicity given the general lack of procedures for the above problem that are both statistically sound and computationally appealing.

Citation

Download Citation

Martin Slawski. Emanuel Ben-David. "Linear regression with sparsely permuted data." Electron. J. Statist. 13 (1) 1 - 36, 2019. https://doi.org/10.1214/18-EJS1498

Information

Received: 1 November 2017; Published: 2019
First available in Project Euclid: 4 January 2019

zbMATH: 07003256
MathSciNet: MR3896144
Digital Object Identifier: 10.1214/18-EJS1498

Subjects:
Primary: 62F35 , 62J05 , 90C10

Keywords: Broken sample , entity resolution , quadratic assignment problem , record linkage , robust regression

Vol.13 • No. 1 • 2019
Back to Top