June 22, 2015

An Implementation of a Randomized Algorithm for Principal Component Analysis

ArXiv PrePrint

By: Arthur Szlam, Mark Tygert, Yuval Kluger

Abstract

Recent years have witnessed intense development of randomized methods for low-rank approximation. These methods target principal component analysis (PCA) and the calculation of truncated singular value decompositions (SVD). The present paper presents an essentially black-box, fool-proof implementation for Mathworks’ MATLAB, a popular software platform for numerical computation. As illustrated via several tests, the randomized algorithms for low-rank approximation outperform or at least match the classical techniques (such as Lanczos iterations) in basically all respects: accuracy, computational efficiency (both speed and memory usage), ease-of-use, parallelizability, and reliability. However, the classical procedures remain the methods of choice for estimating spectral norms, and are far superior for calculating the least singular values and corresponding singular vectors (or singular subspaces).