Speaker:
Alfred Hero (University of Michigan)
Title:
Correlation screening in high dimension
Abstract:
The discovery of a few highly correlated variables among a large number of measured variables is of interest in many fields of science. A natural method for accomplishing this discovery is to perform correlation screening by thresholding the sample correlation matrix. However, when the number of variables is larger than the number of samples the sample correlation matrix is singular and thresholding this matrix can yield many false positives. Indeed, there exists an abrupt phase transition in the average number of false positives as a function of the value of the correlation threshold. We apply the theory of random euclidean graphs and random matrix theory to derive mathematical expressions for the phase transition threshold, the false positive rate, and the false negative rate for correlation screening in single and multiple populations of multivariate measurements.