Given a graphlab.SArray
named coef
:
+-------------+----------------+
| name | value |
+-------------+----------------+
| (intercept) | 87910.0724924 |
| sqft_living | 315.403440552 |
| bedrooms | -65080.2155528 |
| bathrooms | 6944.02019265 |
+-------------+----------------+
[4 rows x 2 columns]
And a graphlab.SFrame
(shown below first 10) named x
:
+-------------+----------+-----------+-------------+
| sqft_living | bedrooms | bathrooms | (intercept) |
+-------------+----------+-----------+-------------+
| 1430.0 | 3.0 | 1.0 | 1 |
| 2950.0 | 4.0 | 3.0 | 1 |
| 1710.0 | 3.0 | 2.0 | 1 |
| 2320.0 | 3.0 | 2.5 | 1 |
| 1090.0 | 3.0 | 1.0 | 1 |
| 2620.0 | 4.0 | 2.5 | 1 |
| 4220.0 | 4.0 | 2.25 | 1 |
| 2250.0 | 4.0 | 2.5 | 1 |
| 1260.0 | 3.0 | 1.75 | 1 |
| 2750.0 | 4.0 | 2.0 | 1 |
+-------------+----------+-----------+-------------+
[1000 rows x 4 columns]
How do I manipulate SArray and SFrame such that the multiplication will return a single vector SArray that has the first row as computed as below?:
87910.0724924 * 1
+ 315.403440552 * 1430.0
+ -65080.2155528 * 3.0
+ 6944.02019265 * 1.0
= 350640.36601600994
I've currently doing silly things converting SFrame / SArray into lists and then converting it into numpy arrays to do np.multiply
. Even after converting into numpy arrays, it's not giving the right matrix-vector multiplication. My current attempt:
import numpy as np
coef # as should in SArray above.
x # as should in the SFrame above.
intercept = list(x['(intercept)'])
sqftliving = list(x['sqft_living'])
bedrooms = list(x['bedrooms'])
bathrooms = list(x['bathrooms'])
x_new = np.column_stack((intercept, sqftliving, bedrooms, bathrooms))
coef_new = np.array(list(coef['value']))
np.multiply(coef_new, x_new)
(wrong) [out]:
[[ 87910.07249236 451026.91998949 -195240.64665846 6944.02019265]
[ 87910.07249236 930440.14962867 -260320.86221128 20832.06057795]
[ 87910.07249236 539339.88334408 -195240.64665846 13888.0403853 ]
...,
[ 87910.07249236 794816.67019127 -260320.86221128 17360.05048162]
[ 87910.07249236 728581.94767533 -260320.86221128 17360.05048162]
[ 87910.07249236 321711.50936313 -130160.43110564 5208.01514449]]
The output of my attempt is wrong too, it should return a single vector scalar values. There must be an easier way to do it.
How do I manipulate SArray and SFrame such that the multiplication will return a single vector SArray that has the first row as computed as below?
And with numpy
Dataframes, how should one perform the matrix-vector multiplcation?
I think your best bet is to convert both the SFrame and SArray to numpy arrays and use the numpy dot
method.
import graphlab
sf = graphlab.SFrame({'a': [1., 2.], 'b': [3., 5.], 'c': [7., 11]})
sa = graphlab.SArray([1., 2., 3.])
X = sf.to_dataframe().values
y = sa.to_numpy()
ans = X.dot(y)
I'm using simpler data here than what you have, but this should work for you as well. The only complication I can see is that you have to make sure the values in your SArray are in the same order as the columns in your SFrame (in your example they aren't).
I think this can be done with an SFrame apply
as well, but unless you have a lot of data, the dot product route is probably simpler.