I am trying to extract only the relevant information from a dataframe. My data looks like
import pandas as pd
import numpy as np
df = pd.DataFrame({'ID': {0: 'id1', 1: 'id1', 2: 'id1'},
'EM': {0: 'met1', 1: 'met2', 2: 'met3'},
'met1_AVG': {0: 0.38, 1: np.nan, 2: np.nan},
'met2_AVG': {0: np.nan, 1: 0.2, 2: np.nan},
'met3_AVG': {0: np.nan, 1: np.nan, 2: 0.58},
'score': {0: 89, 1: 89, 2: 89}})
Please, find my code below. I really would appreciate if someone could help me out. Thank you in advance for your time and helpful assistance
df_melted = df.melt(id_vars=['ID','EM','score']).dropna(subset=['value'])
df_pivoted = pd.pivot_table(data=df_melted,index=['ID','score'],columns=['variable'])
df_ready = df_pivoted.reset_index()
df_ready
Assuming the score
is always same, you can use pandas.DataFrame.groupby.first
:
df.drop("EM",axis=1).groupby("ID", as_index=False).first()
Output:
ID met1_AVG met2_AVG met3_AVG score
0 id1 0.38 0.2 0.58 89