I've been unable to reshape the below dataframe into the long format:
df = pd.DataFrame({'id': [66602088802, 85002620928],
't1': ['car', 'house'],
't1_pct': [0.46, 0.51],
't1_valid': [True, True],
't2': ['bike', 'car'],
't2_pct': [0.15, 0.07],
't2_valid': [True, True],
't3': ['car', 'toy'],
't3_pct': [0.06, 0.07],
't3_valid': [False, False]})
id t1 t1_pct t1_valid t2 t2_pct t2_valid t3 t3_pct t3_valid
0 66602088802 car 0.46 True bike 0.15 True car 0.06 False
1 85002620928 house 0.51 True car 0.07 True toy 0.07 False
My desired outcome is below. I've attempted to use pandas.wide_to_long()
but so far no luck. Thanks in advance.
id test value pct valid
66602088802 1 car 0.46 True
85002620928 1 house 0.51 True
66602088802 2 bike 0.15 True
85002620928 2 car 0.07 True
66602088802 3 car 0.06 False
85002620928 3 toy 0.07 False
Thank you in advance.
pandas 0.23.4
python 3.7.1
You can use wide_to_long
; the issue is just that your column names need to be changed a bit, so that the stubnames are ['pct', 'valid', 'value']
, and not t#
.
import pandas as pd
import numpy as np
# Reverse order of words around '_'
df.columns = ['_'.join(x.split('_')[::-1]) for x in df.columns]
# Add prefix for other stubs
df = df.rename(columns= dict((f't{i}', f'value_t{i}') for i in np.arange(1,4,1)))
pd.wide_to_long(df, stubnames=['pct', 'valid', 'value'],
i='id', j='test', suffix='.*', sep='_').reset_index()
id test pct valid value
0 66602088802 t1 0.46 True car
1 85002620928 t1 0.51 True house
2 66602088802 t2 0.15 True bike
3 85002620928 t2 0.07 True car
4 66602088802 t3 0.06 False car
5 85002620928 t3 0.07 False toy