Facultade de Fisioterapia

A two-sample test for the equality of univariate marginal distributions for high-dimensional data

Cousido Rocha, Marta; de Uña-Álvarez, Jacobo; Hart, Jeffrey D.
Abstract:
A recurring theme in modern statistics is dealing with high-dimensional data whose main feature is a large number, p, of variables but a small sample size. In this context our aim is to address the problem of testing the null hypothesis that the marginal distributions of p variables are the same for two groups. We propose a test statistic motivated by the simple idea of comparing, for each of the p variables, the empirical characteristic functions computed from the two samples. The asymptotic normality of the test statistic is derived under mixing conditions. In our asymptotic analysis the number of variables tends to infinity, while the size of individual samples remains fixed. In order to obtain a practical test several estimators of the variance are proposed, leading to three somewhat different versions of the test. An alternative global test based on the P-values derived from permutation tests is also proposed. A simulation study to investigate the finite sample properties of the proposed tests is carried out, and a practical illustration involving microarray data is provided. © 2019 Elsevier Inc.
Year:
2019
Type of Publication:
Article
Keywords:
Characteristic functions; Goodness of fit tests; Mixing conditions; Permutation tests
Journal:
Journal of Multivariate Analysis
Volume:
174
Number:
104537
Month:
November
Note:
Q3 62/123 h-index 1.029 (JCR2018)
Comments:
This work has received financial support of the Call 2015 Grants for Ph.D. contracts for training of doctors of the Ministry of Economy and Competitiveness, cofinanced by the European Social Fund (Ref. BES-2015-074958 ). We acknowledge support from MTM2014-55966-P project, Ministry of Economy and Competitiveness, Spain , and MTM2017-89422-P project, Ministry of Economy, Industry and Competitiveness, Spain , State Research Agency, Spain , and Regional Development Fund , UE. We also acknowledge the financial support provided by the SiDOR (Statistical Inference, Decision and Operations Research), Spain group through the grant Competitive Reference Group, 2016–2019 ( ED431C 2016/040 ), funded by the “ Consellería de Cultura, Educación e Ordenación Universitaria. Xunta de Galicia, Spain ”. Support from the “ Xunta de Galicia (Centro singular de investigación de Galicia accreditation 2016-2019)” and the European Union (European Regional Development Fund - ERDF) , is gratefully acknowledged. To finish, the first author would like to thank the University of Vigo, Spain , and its Escola Internacional de Doutoramento (EIDO) by the financial support provided through mobility doctorate grants. Appendix A
DOI:
10.1016/j.jmva.2019.104537
Hits: 877