I have tried to get crosstab of data specified by slice.
But something wrong in syntax.
data.csv
like the following
ia,ib,ic,id,ie,if,ig
a,0,0,0,e,0,g
0,b,0,0,e,f,0
0,0,c,d,0,f,g
And then do python3 test.py
like the following
import pandas as pd
import enum
df = pd.read_csv('data.csv')
class Slices(enum.Enum):
first = slice(0, 2)
second = slice(4, 6)
def getCrosstab(*args):
cols1 = []
cols1.append(df.iloc[:, args[0].value])
cols2 = []
cols2.append(df.iloc[:, args[1].value])
print( pd.crosstab(cols1, cols2) )
if __name__ == '__main__':
getCrosstab(Slices.first, Slices.second)
Expected result:
col2 ie if ig
col1
ia 1 0 1
ib 1 1 0
ic 0 1 1
But I had an error:
ValueError: Shape of passed values is (2, 2), indices imply (2, 3)
I can not fully understand the meaning of this error.
Please give me your guidance.
melt
twice, once for each set of columns, and then call crosstab
:
u = (df.melt(['ia', 'ib', 'ic'], var_name='C', value_name='D')
.melt(['C', 'D'], var_name='A', value_name='B')
.query("B != '0' and D != '0'"))
pd.crosstab(u.A, u.C)
C id ie if ig
A
ia 0 1 0 1
ib 0 1 1 0
ic 1 0 1 1
def crosstab_for(df, sliceA, sliceB):
u = (df.reindex(df.columns[sliceA] | df.columns[sliceB], axis=1)
.melt(df.columns[sliceA], var_name='C', value_name='D')
.melt(['C', 'D'], var_name='A', value_name='B')
.query("B != '0' and D != '0'"))
return pd.crosstab(u.A, u.C)
crosstab_for(df, slice(0, 3), slice(4, 7))
C ie if ig
A
ia 1 0 1
ib 1 1 0
ic 0 1 1