I have data like this:
ID Cue trial time accuracy
A apple copy 1450 1
A dog copy 2154 1
A apple test1 2121 0
A dog test2 0 1
A apple final 1231 0
A dog final 5411 1
B apple copy 818 0
B ... ... .. ...
And I need to transform it so that each ID and Cue combination is all on one line:
A apple apple apple copy test1 final 1450 2121 1231 1 0 0
...
In R, I can do it easily with data.table using dcast like this:
dcast(data, ID + Cue ~ rowid(ID, Cue), value.var=c("time", "accuracy"))
But in Pandas I have trouble. If I use pivot_table, it does not come out right. For instance, if I do this:
data.pivot_table(index=['ID', 'Cue', 'TrialType'], values=['time', 'accuracy'])
It always comes out like this:
A apple copy 1450 1
final 1231 0
test1 2121 0
The problem is that I need the "trial" column to not be stacked, but pivot_table / crosstab do not seem capable of handling this. Be aware that if I put the "columns='trial'" in the pivot_table it will not work for my purposes because the 'trial' column is used as a categorical variable in my analysis.
Any hints are appreciated.
Copy column Cue
to new Cue1
for second position by DataFrame.insert
, unpivot by DataFrame.melt
, add counter by GroupBy.cumcount
and reshape by DataFrame.set_index
with Series.unstack
:
df.insert(1, 'Cue1', df['Cue'])
df = df.melt(['ID','Cue'])
df['g'] = df.groupby(['ID','Cue']).cumcount()
df = df.set_index(['ID', 'Cue', 'g'])['value'].unstack().reset_index(level=1, drop=True)
print (df)
g 0 1 2 3 4 5 6 7 8 9 10 11
ID
A apple apple apple copy test1 final 1450 2121 1231 1 0 0
A dog dog dog copy test2 final 2154 0 5411 1 1 1
B apple copy 818 0 NaN NaN NaN NaN NaN NaN NaN NaN
Another idea:
c = df.columns.difference(['ID'], sort=False)
df = (df.groupby(['ID','Cue'])[c]
.apply(lambda x: pd.DataFrame([x.T.values.ravel()]))
.reset_index(level=[1,2], drop=True))
print (df)
0 1 2 3 4 5 6 7 8 9 10 \
ID
A apple apple apple copy test1 final 1450.0 2121.0 1231.0 1.0 0.0
A dog dog dog copy test2 final 2154.0 0.0 5411.0 1.0 1.0
B apple copy 818 0 NaN NaN NaN NaN NaN NaN NaN
11
ID
A 0.0
A 1.0
B NaN