Search code examples
pythonblaze

Blaze Data field map throws TypeError


I have recently started moving my data exploration code set from pandas to blaze. I am running into the following issue.

Assume:

from blaze import *

s = Data([(1, 'Alice', 100),
...           (2, 'Bob', -200),
...           (3, 'Charlie', 300),
...           (4, 'Denis', 400),
...           (5, 'Edith', -500)],
...          fields=['id', 'name', 'balance'])

we can using pandas.DataFrame via into readily compute something like:

into(pd.DataFrame,s).balance.apply(abs)

However, I am having serious difficulties trying to do:

s.balance.map(abs,schema='{b: int64}')

throws a TypeError: a bytes-like object is required, not 'int' among other things.

This issue seems related to Best approach to apply a function to a column or create a new column by applying a function to another one? which is closed, so I am not sure where to turn.

ps: if you feel this is trivial and want to mark the question down, please also provide a complete working answer.


Solution

  • Try passing 'int64' as the datashape, rather than passing in a value for schema. It's the second keyword argument, so you don't need to name it. The following:

    from blaze import *
    s = Data([(1, 'Alice', 100),
              (2, 'Bob', -200),
              (3, 'Charlie', 300),
              (4, 'Denis', 400),
              (5, 'Edith', -500)],
              fields=['id', 'name', 'balance'])
    s.balance.map(abs, 'int64')
    

    works for me, and produces:

       balance
    0      100
    1      200
    2      300
    3      400
    4      500
    

    p.s. Though importing everything from blaze seems to be clobbering the built-in abs with blaze.expr.abs, I don't think that matters.