Consider this simple data set whose columns are cut by quantiles.
kyle = pd.DataFrame({'foo':np.random.randint(0,100,100),'boo':np.random.randint(0,100,100)})
kyle.loc[:,'fooCut'] = pd.qcut(kyle.loc[:,'foo'], np.arange(0,1.1,.1))
kyle.loc[:,'booCut'] = pd.qcut(kyle.loc[:,'boo'], np.arange(0,1.1,.1))
Previous versions of Pandas
handled the below as expected...
pd.crosstab(kyle.fooCut,kyle.booCut)
After updating to version '0.24.2', the above is throwing me a TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'
Does anyone know why and how to solve this? Note that here, kyle.boocut.dtype
returns CategoricalDtype
, a type that is the same as in the pd.crosstab
documentation and example for categorical variables.
This is a known bug in pandas and is being fixed
As uncovered by OP, this is an issue relating to pivoting (crosstab
is an optimised version of pivot_table
under the hood) Interval columns and is currently being fixed for v0.25.
Here's a workaround involving crosstabulating the integer codes:
cstab = pd.crosstab(kyle.fooCut.cat.codes, kyle.booCut.cat.codes)
cstab
col_0 0 1 2 3 4 5 6 7 8 9
row_0
0 0 2 0 1 3 1 2 1 1 1
1 1 1 0 1 1 2 1 0 1 2
2 2 1 1 0 1 1 2 0 0 0
3 2 1 3 1 2 0 0 0 0 1
4 1 2 1 0 0 2 0 1 1 2
5 0 2 0 1 0 1 0 3 3 0
6 2 0 1 2 0 2 1 1 1 1
7 1 0 0 2 2 0 1 1 2 0
8 0 1 1 0 1 1 3 1 1 1
9 1 1 2 2 0 0 2 1 0 1
If you want to, you can always assign the index and columns of the result to the actual categories:
cstab.index = kyle.fooCut.cat.categories
cstab.columns = kyle.booCut.cat.categories