Let's say that I have this dataframe :
data3 = ['ID','ID','','','','','']
data4 = [12,34,465,678,896,'','']
data5 = [8798,67,2313,'','','','']
data6 = [56,67,'','','','','']
df2 = pd.DataFrame(list(zip(data3,data4,data5,data6)),columns = ['Name','Data1','Data2','Data3'])
print(df2)
Name Data1 Data2 Data3
0 ID 12 8798 56
1 ID 34 67 67
2 465 2313
3 678
4 896
5
6
I want to fill the column "Name" with always the same value that we can find it and for all the rows where there are values and drop the rows useless where there is nothing. So I would like to get this result :
Name Data1 Data2 Data3
0 ID 12 8798 56
1 ID 34 67 67
2 ID 465 2313
3 ID 678
4 ID 896
Anyone has an idea efficient to do that ?
Thank you
Use DataFrame.replace
if empty strings, not NaN
s, then DataFrame.dropna
and last forward filling missing values in Name
column by ffill
:
df2 = df2.replace('', np.nan)
df2 = df2.dropna(how='all')
df2['Name'] = df2['Name'].ffill()
print(df2)
Name Data1 Data2 Data3
0 ID 12.0 8798.0 56.0
1 ID 34.0 67.0 67.0
2 ID 465.0 2313.0 NaN
3 ID 678.0 NaN NaN
4 ID 896.0 NaN NaN