Search code examples
pythonnumpymatrixgraphlabsframe

Matrix multiplication with SFrame and SArray with Graphlab and/or Numpy


Given a graphlab.SArray named coef:

+-------------+----------------+
|     name    |     value      |
+-------------+----------------+
| (intercept) | 87910.0724924  |
| sqft_living | 315.403440552  |
|   bedrooms  | -65080.2155528 |
|  bathrooms  | 6944.02019265  |
+-------------+----------------+
[4 rows x 2 columns]

And a graphlab.SFrame (shown below first 10) named x:

+-------------+----------+-----------+-------------+
| sqft_living | bedrooms | bathrooms | (intercept) |
+-------------+----------+-----------+-------------+
|    1430.0   |   3.0    |    1.0    |      1      |
|    2950.0   |   4.0    |    3.0    |      1      |
|    1710.0   |   3.0    |    2.0    |      1      |
|    2320.0   |   3.0    |    2.5    |      1      |
|    1090.0   |   3.0    |    1.0    |      1      |
|    2620.0   |   4.0    |    2.5    |      1      |
|    4220.0   |   4.0    |    2.25   |      1      |
|    2250.0   |   4.0    |    2.5    |      1      |
|    1260.0   |   3.0    |    1.75   |      1      |
|    2750.0   |   4.0    |    2.0    |      1      |
+-------------+----------+-----------+-------------+
[1000 rows x 4 columns]

How do I manipulate SArray and SFrame such that the multiplication will return a single vector SArray that has the first row as computed as below?:

   87910.0724924   * 1 
+    315.403440552 * 1430.0 
+ -65080.2155528   * 3.0
+   6944.02019265  * 1.0 
= 350640.36601600994

I've currently doing silly things converting SFrame / SArray into lists and then converting it into numpy arrays to do np.multiply. Even after converting into numpy arrays, it's not giving the right matrix-vector multiplication. My current attempt:

import numpy as np
coef # as should in SArray above.
x # as should in the SFrame above.
intercept = list(x['(intercept)'])
sqftliving =  list(x['sqft_living'])
bedrooms =  list(x['bedrooms'])
bathrooms =  list(x['bathrooms'])
x_new = np.column_stack((intercept, sqftliving, bedrooms, bathrooms))

coef_new = np.array(list(coef['value']))

np.multiply(coef_new, x_new)

(wrong) [out]:

[[  87910.07249236  451026.91998949 -195240.64665846    6944.02019265]
 [  87910.07249236  930440.14962867 -260320.86221128   20832.06057795]
 [  87910.07249236  539339.88334408 -195240.64665846   13888.0403853 ]
 ..., 
 [  87910.07249236  794816.67019127 -260320.86221128   17360.05048162]
 [  87910.07249236  728581.94767533 -260320.86221128   17360.05048162]
 [  87910.07249236  321711.50936313 -130160.43110564    5208.01514449]]

The output of my attempt is wrong too, it should return a single vector scalar values. There must be an easier way to do it.

How do I manipulate SArray and SFrame such that the multiplication will return a single vector SArray that has the first row as computed as below?

And with numpy Dataframes, how should one perform the matrix-vector multiplcation?


Solution

  • I think your best bet is to convert both the SFrame and SArray to numpy arrays and use the numpy dot method.

    import graphlab
    
    sf = graphlab.SFrame({'a': [1., 2.], 'b': [3., 5.], 'c': [7., 11]})
    sa = graphlab.SArray([1., 2., 3.])
    
    X = sf.to_dataframe().values
    y = sa.to_numpy()
    
    ans = X.dot(y)
    

    I'm using simpler data here than what you have, but this should work for you as well. The only complication I can see is that you have to make sure the values in your SArray are in the same order as the columns in your SFrame (in your example they aren't).

    I think this can be done with an SFrame apply as well, but unless you have a lot of data, the dot product route is probably simpler.