Search code examples
pythonpandasdataframemelt

Melt pandas dataframe based on condition


I have a dataframe with the following format

timestamp ID Col1 Col2 Col3 Col4 UsefulCol
16/11/2021 1 0.2 0.1 Col3
17/11/2021 1 0.3 0.8 Col3
17/11/2021 2 10 Col2
17/11/2021 3 0.1 2 Col4

And I want to "melt" it into this format:

timestamp ID Col Value
16/11/2021 1 Col3 0.1
17/11/2021 1 Col3 0.8
17/11/2021 2 Col2 10
17/11/2021 3 Col4 2

How would I go about this?

Input as dataframe:

from numpy import nan
df = pd.DataFrame({'timestamp': ['16/11/2021', '17/11/2021', '17/11/2021', '17/11/2021'],
                   'ID': [1, 1, 2, 3],
                   'Col1': [0.2, 0.3, nan, nan],
                   'Col2': [nan, nan, 10.0, nan],
                   'Col3': [0.1, 0.8, nan, 0.1],
                   'Col4': [nan, nan, nan, 2.0],
                   'UsefulCol': ['Col3', 'Col3', 'Col2', 'Col4']})

Solution

  • Try making a column with the useful values first:

    df['Value'] = df.apply(lambda x: x[x.UsefulCol], axis=1)

    timestamp   ID    Col1    Col2    Col3    Col4    UsefulCol    Value
    16/11/2021  1     0.2             0.1             Col3         0.1
    17/11/2021  1     0.3             0.8             Col3         0.8
    17/11/2021  2              10                     Col2         10
    17/11/2021  3                     0.1     2       Col4         2
    

    Then, you can drop the columns you wanted to melt:

    df.drop(['Col1', 'Col2', 'Col3', 'Col4], axis=1, inplace=True)

    timestamp   ID    UsefulCol    Value
    16/11/2021  1     Col3         0.1
    17/11/2021  1     Col3         0.8
    17/11/2021  2     Col2         10
    17/11/2021  3     Col4         2
    

    Rename your columns if you need:

    df.rename({'UsefulCol':'Col'}, axis=1, inplace=True)

    or

    df.columns = [timestamp', 'ID', 'Col', 'Value]