Search code examples
pythonpandasdataframeunpivot

Change table to tall format using panda (UNPIVOT)


I have a table like this

   user         company company2 company3 company4
    1           Mac     Lenovo    Hp      null              
    2           Mac       MSI     Sony                          

And using pandas I would like it to be

     user    company
     1          Mac
     1          Lenovo
     1          Hp
     2         Mac

and so on Here I tried it but didnt work with pandas pivot.

dataframe = pd.read_csv('data.csv')
dataframe.fillna(value='', inplace=True)
#dataframe.pivot(index='user', columns='company')

Above code doesnt work and gives error.


Solution

  • you can use pd.melt method:

    In [211]: pd.melt(df, id_vars='user', value_vars=df.columns.drop('user').tolist())
    Out[211]:
       user  variable   value
    0     1   company     Mac
    1     2   company     Mac
    2     1  company2  Lenovo
    3     2  company2     MSI
    4     1  company3      Hp
    5     2  company3    Sony
    6     1  company4    null
    7     2  company4     NaN
    

    or

    In [213]: pd.melt(df,
                      id_vars='user', value_vars=df.columns.drop('user').tolist(),
                      value_name='Company') \
                .drop('variable',1)
    Out[213]:
       user Company
    0     1     Mac
    1     2     Mac
    2     1  Lenovo
    3     2     MSI
    4     1      Hp
    5     2    Sony
    6     1    null
    7     2     NaN
    

    UPDATE: dropping NaN's and sorting resulting DF by user:

    In [218]: pd.melt(df,
         ...:         id_vars='user', value_vars=df.columns.drop('user').tolist(),
         ...:         value_name='Company') \
         ...:   .drop('variable',1) \
         ...:   .dropna() \
         ...:   .sort_values('user')
         ...:
    Out[218]:
       user Company
    0     1     Mac
    2     1  Lenovo
    4     1      Hp
    6     1    null
    1     2     Mac
    3     2     MSI
    5     2    Sony
    

    PS if you want to get rid of null values - use df.replace('null', np.nan) instead of df:

    In [219]: pd.melt(df.replace('null', np.nan),
         ...:         id_vars='user', value_vars=df.columns.drop('user').tolist(),
         ...:         value_name='Company') \
         ...:   .drop('variable',1) \
         ...:   .dropna() \
         ...:   .sort_values('user')
         ...:
    Out[219]:
       user Company
    0     1     Mac
    2     1  Lenovo
    4     1      Hp
    1     2     Mac
    3     2     MSI
    5     2    Sony