Search code examples
pythonpandasdataframepandas-melt

Turn the column names of a pandas df into a single column feature (with many nans)


pd.DataFrame({'apple': [np.nan, 'Bad', 'Good'], 'banana': [np.nan, np.nan, np.nan], 'orange': ['Good', np.nan, np.nan]})

Right now, I have a pd dataframe in the format similar to below:

    apple   banana  orange
0   NaN     NaN     Good
1   Bad     NaN     NaN
2   Good    NaN     NaN

In this df, the strings 'Good' or 'Bad' should only appear in each observation once. What I want to do is to convert the fruit column headers into a separate column, and have a third column to indicate the values in the cells, in this format:

    fruit   result  
0   orange  Good
1   apple   Bad
2   apple   Good

I think the fact that 'banana' is dropped from the entries as well as the NaNs around the dataset makes me kinda stumped so to how to proceed.

I've looked into pd.melt and .pivot, but neither of them really seem to be doing exactly what I'm looking for. pd.melt() turns the df into this:

    variable    value
0   apple       NaN
1   apple       Good
2   apple       Good
3   banana      NaN
4   banana      NaN
5   banana      NaN
6   orange      Good
7   orange      NaN
8   orange      NaN

Where 'Bad' gets removed unwantedly, and even if I filter out the NaNs I'm not sure that the prior index of the original df will be retained.

Any help would be appreciated, thank you!


Solution

  • This works:

    import pandas as pd
    import numpy as np
    df = pd.DataFrame({'apple': [np.nan, 'Bad', 'Good'], 'banana': [np.nan, np.nan, np.nan], 'orange': ['Good', np.nan, np.nan]})
    df2 = df.melt().dropna()
    print(df2)
      variable value
    1    apple   Bad
    2    apple  Good
    6   orange  Good