Perturbation analysis of randomized SVD and its applications to high-dimensional statistics

Abstract

Randomized singular value decomposition (RSVD) is a class of computationally efficient algorithms for computing the truncated SVD of large data matrices. Given a symmetric matrix M, the prototypical RSVD algorithm outputs an approximation of the k leading singular vectors of M by computing the SVD of M^g G; here g is a positive integer and G is a random Gaussian sketching matrix. In this paper we study the statistical properties of RSVD under a general ``signal-plus-noise’’ framework, i.e., the observed matrix M is assumed to be an additive perturbation of some true but unknown signal matrix S. We first derive upper bounds for the spectral norm and two-to-infinity norm between the approximate singular vectors of M and the true singular vectors of S. These upper bounds depend on the signal-to-noise ratio (SNR) and the number of power iterations g. A phase transition phenomenon is observed in which a smaller SNR requires larger values of g to guarantee convergence of the spectral and two-to-infinity distances. We also show that the thresholds for g where these phase transitions occur are sharp whenever the noise matrices satisfy a certain trace growth condition. Finally, we derive normal approximations for the row-wise fluctuations of these approximate singular vectors and the entrywise fluctuations when M is projected onto these vectors. We illustrate our theoretical results by deriving nearly-optimal performance guarantees for RSVD when applied to three statistical inference problems, namely, community detection, matrix completion, and PCA with missing data.

Publication
arXiv preprint