December 2024 Incorporating auxiliary information for improved statistical inference and its extensions to distributed algorithms with an application to personal credit
Miaomiao Yu, Zhongfeng Jiang, Jiaxuan Li, Yong Zhou
Author Affiliations +
Ann. Appl. Stat. 18(4): 2863-2886 (December 2024). DOI: 10.1214/24-AOAS1909

Abstract

Personal credits have always been a hot topic in the society. Among all of them, the evaluation of default risk is particularly concerned since robust estimation, based on personal information, can both help needy individuals to get loans and financial institutions to avoid losses. So far, there have been no good solutions due to limited data, especially default information. With the advent of the era of big data, it is possible to improve the effectiveness of estimates by using auxiliary information from external studies or public domains. However, the individual-level data can not be gained directly because of the emphasis on data privacy; that is, only some summarized statistics with auxiliary information are allowed to be shared. To effectively utilize external integrated auxiliary information to improve the accuracy of default risk estimation, this paper introduces a unified auxiliary information framework, which is referred as enhanced GEE method, to effectively incorporate various external summary results by employing the generalized estimating equations (GEE) approach and augmenting a weighted logarithm of confidence density on GEE function. We establish asymptotic properties for the new method and prove that it can achieve the gain of statistical efficiency compared to the study-specific estimator without any auxiliary information. Besides, a low-cost Map-Reduce procedure for the distributed statistical inference of enhanced GEE method in big data is developed that can achieve the same efficiency as the oracle enhanced GEE approach under mild condition. This method is demonstrated by an application to predict the loan default risk of bank customers in Shanghai and shown to be more effective and reliable compared with the method based on the own data only. Furthermore, the superiorities of our approach, especially the construction of the tighter confidence intervals, are also illustrated with extensive simulation studies and a real personal default risk case.

Funding Statement

This work is supported in part by funds from the Program of National Natural Science Foundation of China(No. 72301108), National Key R&D Program of China (No. 2021YFA1000100 and 2021YFA1000101), Shanghai Pujiang Program (No. 23PJC040), State Key Program of National Natural Science Foundation of China (No. 72331005 and 92046005), Shanghai Science and Technology Innovation Funds (No. 23JS1400501).

Acknowledgments

The authors would like to thank the anonymous referees, the Associate Editor, and the Editor for their constructive comments that improved the quality of this paper.

Citation

Download Citation

Miaomiao Yu. Zhongfeng Jiang. Jiaxuan Li. Yong Zhou. "Incorporating auxiliary information for improved statistical inference and its extensions to distributed algorithms with an application to personal credit." Ann. Appl. Stat. 18 (4) 2863 - 2886, December 2024. https://doi.org/10.1214/24-AOAS1909

Information

Received: 1 October 2023; Revised: 1 April 2024; Published: December 2024
First available in Project Euclid: 31 October 2024

Digital Object Identifier: 10.1214/24-AOAS1909

Keywords: confidence density , distributed statistical inference , External auxiliary information , generalized estimating equations , individual-level data

Rights: Copyright © 2024 Institute of Mathematical Statistics

Vol.18 • No. 4 • December 2024
Back to Top