I'm using Graphlab, but I guess this question can apply to pandas.
import graphlab
sf = graphlab.SFrame({'id': [1, 2, 3], 'user_score': [{"a":4, "b":3}, {"a":5, "b":7}, {"a":2, "b":3}], 'weight': [4, 5, 2]})
I want to create a new column where the value of each element in 'user_score' is multiplied by the number in 'weight'. That is,
sf = graphlab.SFrame({'id': [1, 2, 3], 'user_score': [{"a":4, "b":3}, {"a":5, "b":7}, {"a":2, "b":3}], 'weight': [4, 5, 2]}, 'new':[{"a":16, "b":12}, {"a":25, "b":35}, {"a":4, "b":6}])
I tried to write a simple function below and applied to no avail. Any thoughts?
def trans(x, y):
d = dict()
for k, v in x.items():
d[k] = v*y
return d
sf.apply(trans(sf['user_score'], sf['weight']))
It got the following error message:
AttributeError: 'SArray' object has no attribute 'items'
This is subtle, but I think what you want is this:
sf.apply(lambda row: trans(row['user_score'], row['weight']))
The apply function takes a function as its argument, and will pass each row as the parameter to that function. In your version, you are evaluating the trans function before apply is called, which is why the error message complains about passing an SArray to the trans function when a dict is expected.