I have been trying to use the principal-components
function from Incanter to do PCA and seem to be off track in using it. I found some sample data online from a PCA tutorial and wanted to practice on it:
(def data [[0.69 0.49] [-1.31 -1.21] [0.39 0.99] [0.09 0.29] [1.29 1.09]
[0.49 0.79] [0.19 (- 0 0.31)] [(- 0 0.81) (- 0 0.81)]
[(- 0 0.31) (- 0 0.31)] [(- 0 0.71) (- 0 1.01)]])
Upon first attempt to implement PCA I tried passing vectors to Incanter's matrix function, but found myself passing it too many arguments. At this point I decided to try a nested vector structure as defined above, but would like to avoid this route.
How would I turn data
into a matrix (Incanter) such that it will be accepted as input into Incanter's function principal-components
. For simplicity let's call the new matrix fooMatrix.
Once this matrix, fooMatrix, has been constructed the following code should work to extract the first two principal components
(def pca (principal-components fooMatrix))
(def components (:rotation pca))
(def pc1 (sel components :cols 0))
(def pc2 (sel components :cols 1))
and then the data can be projected on the principal components by
(def principal1 (mmult fooMatrix pc1))
(def principal2 (mmult fooMatrix pc2))
Check out the Incanter API. I believe you just want (incanter.core/matrix data)
. These are your options for Incanter's matrix function. Maybe A2 is what you're interested in.
(def A (matrix [[1 2 3] [4 5 6] [7 8 9]])) ; produces a 3x3 matrix
(def A2 (matrix [1 2 3 4 5 6 7 8 9] 3)) ; produces the same 3x3 matrix
(def B (matrix [1 2 3 4 5 6 7 8 9])) ; produces a 9x1 column vector
Example using your data:
user=> (use '[incanter core stats charts datasets])
nil
user=>(def data [0.69 0.49 -1.31 -1.21 0.39 0.99 0.09 0.29 1.29
1.09 0.49 0.79 0.19 (- 0 0.31) (- 0 0.81) (- 0 0.81)
(- 0 0.31) (- 0 0.31) (- 0 0.71) (- 0 1.01)])
user=>(def fooMatrix (matrix data 2))
user=>(principal-components fooMatrix)
{:std-dev (1.3877785387777999 0.27215937850413047), :rotation A 2x2 matrix
-------------
-7.07e-01 -7.07e-01
-7.07e-01 7.07e-01
}
Voilà. Nested vector structure gone.