pd.DataFrame({'apple': [np.nan, 'Bad', 'Good'], 'banana': [np.nan, np.nan, np.nan], 'orange': ['Good', np.nan, np.nan]})
Right now, I have a pd dataframe in the format similar to below:
apple banana orange
0 NaN NaN Good
1 Bad NaN NaN
2 Good NaN NaN
In this df, the strings 'Good' or 'Bad' should only appear in each observation once. What I want to do is to convert the fruit column headers into a separate column, and have a third column to indicate the values in the cells, in this format:
fruit result
0 orange Good
1 apple Bad
2 apple Good
I think the fact that 'banana' is dropped from the entries as well as the NaNs around the dataset makes me kinda stumped so to how to proceed.
I've looked into pd.melt and .pivot, but neither of them really seem to be doing exactly what I'm looking for. pd.melt() turns the df into this:
variable value
0 apple NaN
1 apple Good
2 apple Good
3 banana NaN
4 banana NaN
5 banana NaN
6 orange Good
7 orange NaN
8 orange NaN
Where 'Bad' gets removed unwantedly, and even if I filter out the NaNs I'm not sure that the prior index of the original df will be retained.
Any help would be appreciated, thank you!
This works:
import pandas as pd
import numpy as np
df = pd.DataFrame({'apple': [np.nan, 'Bad', 'Good'], 'banana': [np.nan, np.nan, np.nan], 'orange': ['Good', np.nan, np.nan]})
df2 = df.melt().dropna()
print(df2)
variable value
1 apple Bad
2 apple Good
6 orange Good