03/11/2017 | TEXAS A&M UNIVERSITY
“A Two-Sample Test for the Equality of Densities from High-Dimensional Data Under Mixing Conditions”
ABSTRACT
A recurring theme in modern statistics is dealing with high-dimensional data whose main feature is that the data dimension p (number of variables) is high while the sample size is relatively small. In this context our aim is to address the problem of testing the null hypothesis that the density of each of the p variables is the same for two groups of individuals. Hence we propose a test statistic motivated by the simple idea of comparing for each of the p variables the kernel density estimators computed from the two samples. The asymptotic normality of the test statistic is derived under mixing conditions. In our asymptotic analysis the number of variables tends to infinity in agreement with our high-dimensional setting, while the size of the samples remains fixed. In order to compute the test statistic in practice three variance estimators are proposed. A simulation study to investigate the main properties of the proposed test based on the three variance estimators is carried out. The main conclusion of the simulation study is that the proposed test respects the nominal level and that it exhibits a good power under specific alternatives. A practical illustration involving microarray data is provided. In addition we deal with the problem of identifying which variables have different densities in the two populations when the global null hypothesis is rejected.