I have recently started moving my data exploration code set from pandas
to blaze
. I am running into the following issue.
Assume:
from blaze import *
s = Data([(1, 'Alice', 100),
... (2, 'Bob', -200),
... (3, 'Charlie', 300),
... (4, 'Denis', 400),
... (5, 'Edith', -500)],
... fields=['id', 'name', 'balance'])
we can using pandas.DataFrame
via into
readily compute something like:
into(pd.DataFrame,s).balance.apply(abs)
However, I am having serious difficulties trying to do:
s.balance.map(abs,schema='{b: int64}')
throws a TypeError: a bytes-like object is required, not 'int'
among other things.
This issue seems related to Best approach to apply a function to a column or create a new column by applying a function to another one? which is closed, so I am not sure where to turn.
ps: if you feel this is trivial and want to mark the question down, please also provide a complete working answer.
Try passing 'int64'
as the datashape
, rather than passing in a value for schema
. It's the second keyword argument, so you don't need to name it. The following:
from blaze import *
s = Data([(1, 'Alice', 100),
(2, 'Bob', -200),
(3, 'Charlie', 300),
(4, 'Denis', 400),
(5, 'Edith', -500)],
fields=['id', 'name', 'balance'])
s.balance.map(abs, 'int64')
works for me, and produces:
balance
0 100
1 200
2 300
3 400
4 500
p.s. Though importing everything from blaze seems to be clobbering the built-in abs
with blaze.expr.abs
, I don't think that matters.