Search code examples
phpsvdeigenvectoreigenvalue

Singular Value Decomposition (SVD) in PHP


I would like to implement Singular Value Decomposition (SVD) in PHP. I know that there are several external libraries which could do this for me. But I have two questions concerning PHP, though: 1) Do you think it's possible and/or reasonable to code the SVD in PHP? 2) If (1) is yes: Can you help me to code it in PHP?

I've already coded some parts of SVD by myself. Here's the code which I made comments to the course of action in. Some parts of this code aren't completely correct.

It would be great if you could help me. Thank you very much in advance!


Solution

  • SVD-python Is a very clear, parsimonious implementation of the SVD. It's practically psuedocode and should be fairly easy to understand and compare/draw on for your php implementation, even if you don't know much python.

    SVD-python

    That said, as others have mentioned I wouldn't expect to be able to do very heavy-duty LSA with php implementation what sounds like a pretty limited web-host.

    Cheers

    Edit: The module above doesn't do anything all by itself, but there is an example included in the opening comments. Assuming you downloaded the python module, and it was accessible (e.g. in the same folder), you could implement a trivial example as follow,

    #!/usr/bin/python
    import svd
    import math
    
    a = [[22.,10., 2.,  3., 7.],
         [14., 7.,10.,  0., 8.],
         [-1.,13.,-1.,-11., 3.],
         [-3.,-2.,13., -2., 4.],
         [ 9., 8., 1., -2., 4.],
         [ 9., 1.,-7.,  5.,-1.],
         [ 2.,-6., 6.,  5., 1.],
         [ 4., 5., 0., -2., 2.]]
    
    u,w,vt = svd.svd(a)
    print w
    

    Here 'w' contains your list of singular values.
    Of course this only gets you part of the way to latent semantic analysis and its relatives. You usually want to reduce the number of singular values, then employ some appropriate distance metric to measure the similarity between your documents, or words, or documents and words, etc. The cosine of the angle between your resultant vectors is pretty popular.

    Latent Semantic Mapping (pdf)

    is by far the clearest, most concise and informative paper I've read on the remaining steps you need to work out following the SVD.

    Edit2: also note that if you're working with very large term-document matrices (I'm assuming this is what you are doing) it is almost certainly going to be far more efficient to perform the decomposition in an offline mode, and then perform only the comparisons in a live fashion in response to requests. while svd-python is great for learning, the svdlibc is more what you would want for such heavy computation.

    finally as mentioned in the bellegarda paper above, remember that you don't have to recompute the svd every single time you get a new document or request. depending on what you are trying to do you could probably get away with performing the svd once every week or so, in an offline mode, a local machine, and then uploading the results (size/bandwidth concerns notwithstanding).

    anyway good luck!