Search code examples
pythonpandassparse-matrix

Mathematical operations on sparse-value columns in Pandas produces an error


In pandas, I have a dataframe where I need to generate two matrices of dummies, and then add the columns of one dummy matrix to the other. However, it appears that pandas does not support mathematical operations with two columns of sparse values.

# illustrative example
import pandas as pd

mat = [['cat','black',18],
       ['dog','brown',12],
       ['cat','tabby',9],
       ['mouse','brown',0.2]]
testframe = pd.DataFrame(mat, columns = ['animal','color','weight'])

# create a new dataframe of dummies with columns "cat", "dog", and "mouse"
animals = pd.get_dummies(testframe['animal'], sparse = True)

# the following does not create an error
animals['cat'] + 1

# this does
animals['cat'] + animals['dog']

When running the last line, I get the error module 'pandas._libs.sparse' has no attribute 'sparse_add_uint8'.

It appears that I can still perform scalar operations on sparse-valued columns; as mentioned, I can add a single number to such a column without issue. However, there are no results online that expand on the error message.

My first thought for a workaround is to simply convert to SciPy sparse matrices and then back to Pandas, but I would prefer to stick with Pandas if possible.


Solution

  • Thanks to @hpaulj, it appears that this can be solved by requiring dtype = np.int64 when creating the dummy values. Hopefully, this will not be required in future versions of Pandas.