$23.99
For this assignment you are going to implement Edo Liberty’s Matrix Sketch- ing algorithm [1]. Although this procedure was inspired by the family of Misra- Gries Streaming Algorithms, you do not have to implement it as such. That is, a non-streaming version should be fine.
You should take a look at Dr. Liberty’s talk (click here) at the Simons In- stitute to get at a possible implementation-strategy for this assignment. Essen- tially we have a (large) data-matrix A ∈ Rm×n , where n m1 . The NEWMAT library has a robust and fast-implementation of the SVD-Decomposition algo- rithm – SVD(A, D, U, V); – where A = U × D × V.t(). As you can gather from the on line reference, NEWMAT assumes A is an m × n matrix where m ≥ n. If n m, then you have to work with A.t(), and make appropriate changes to U and V to make things work.
I have provided a hint in hint.cpp on Compass. Keep in mind that I tend to have my datasets appear in columnar-format. That said, you have to figure out the implementation details yourself. I have also provided two datasets on Compass. Sample outputs (using these two datasets) are shown in figure 1.
You have to take a measured-and-slow approach to getting this assignment done. The first-part involves understanding Dr. Liberty’s algorithm. Following this, you have to think about a slightly-different implementation (from what is in his talk) – and this has to do with zeroing-out just the lowest-singular value (as opposed to everything below the median-singular-value). Then, there is the issue with getting NEWMAT’s SVD to do the needful. In the midst of all this, there are routine-things like computing the Frobenius Norm of a matrix (which BTW, can also be computed using its Singular Values ). As you can see from the Frobenius Norms of the Original- and Sketch-Matrices in figure 1, my implementation comfortably meets the -specification (i.e. the reduction is Frobenius-Norm is not much at all).
References
[1] Edo Liberty. Simple and deterministic matrix sketching. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’13, pages 581–588, New York, NY, USA, 2013. ACM.
1: An illustration of the command-line variables.