I have a table like this
user company company2 company3 company4
1 Mac Lenovo Hp null
2 Mac MSI Sony
And using pandas I would like it to be
user company
1 Mac
1 Lenovo
1 Hp
2 Mac
and so on Here I tried it but didnt work with pandas pivot.
dataframe = pd.read_csv('data.csv')
dataframe.fillna(value='', inplace=True)
#dataframe.pivot(index='user', columns='company')
Above code doesnt work and gives error.
you can use pd.melt method:
In [211]: pd.melt(df, id_vars='user', value_vars=df.columns.drop('user').tolist())
Out[211]:
user variable value
0 1 company Mac
1 2 company Mac
2 1 company2 Lenovo
3 2 company2 MSI
4 1 company3 Hp
5 2 company3 Sony
6 1 company4 null
7 2 company4 NaN
or
In [213]: pd.melt(df,
id_vars='user', value_vars=df.columns.drop('user').tolist(),
value_name='Company') \
.drop('variable',1)
Out[213]:
user Company
0 1 Mac
1 2 Mac
2 1 Lenovo
3 2 MSI
4 1 Hp
5 2 Sony
6 1 null
7 2 NaN
UPDATE: dropping NaN's and sorting resulting DF by user
:
In [218]: pd.melt(df,
...: id_vars='user', value_vars=df.columns.drop('user').tolist(),
...: value_name='Company') \
...: .drop('variable',1) \
...: .dropna() \
...: .sort_values('user')
...:
Out[218]:
user Company
0 1 Mac
2 1 Lenovo
4 1 Hp
6 1 null
1 2 Mac
3 2 MSI
5 2 Sony
PS if you want to get rid of null
values - use df.replace('null', np.nan)
instead of df
:
In [219]: pd.melt(df.replace('null', np.nan),
...: id_vars='user', value_vars=df.columns.drop('user').tolist(),
...: value_name='Company') \
...: .drop('variable',1) \
...: .dropna() \
...: .sort_values('user')
...:
Out[219]:
user Company
0 1 Mac
2 1 Lenovo
4 1 Hp
1 2 Mac
3 2 MSI
5 2 Sony