I have a dataframe, but I need to perform some cleaning, in some cases the values from above and below rows in column "Code" are separated, is there any way to merge these rows? df - is what I have, and df1 what I need to have (the values can be both in numeric and string format).
import pandas as pd
import numpy as np
df = pd.DataFrame({"DT":[np.nan,65,np.nan, 65,45,np.nan,34,np.nan],
"Code":[56,np.nan,"TD","TR",52,1,np.nan,76],
"DV":[np.nan,"CAT",np.nan,"B54","T65",np.nan,765,np.nan],
"GT":[np.nan,76,np.nan,np.nan,np.nan,np.nan,65,np.nan]})
df1 = pd.DataFrame({"DT":[65,65,45,34],
"Code":["56 TD","TR",52,"1 76"],
"DV":["CAT","B54","T65",765],
"GT":[76,np.nan,np.nan,65]})
Here is my solution:
Original dataset (df):
DT Code DV GT
0 NaN 56 NaN NaN
1 65.0 NaN CAT 76.0
2 NaN TD NaN NaN
3 65.0 TR B54 NaN
4 45.0 52 T65 NaN
5 NaN 1 NaN NaN
6 34.0 NaN 765 65.0
7 NaN 76 NaN NaN
Code:
df['temp1']=df['Code'].shift()
df['temp2']=df['Code'].shift(-1)
df['temp1']=df['temp1'].fillna('')
df['temp2']=df['temp2'].fillna('')
df['temp1']=df['temp1'].astype(str)
df['temp2']=df['temp2'].astype(str)
df['Code']=np.where(df['Code'].isna(), df['temp1']+' '+df['temp2'], df['Code'])
df['temp1']=df['temp1'].replace('', np.nan)
df['temp2']=df['temp2'].replace('', np.nan)
df.dropna(subset=['temp1', 'temp2'], how='any', inplace=True)
df.drop(['temp1', 'temp2'], axis=1, inplace=True)
df.reset_index(inplace=True, drop=True)
df['DT']=df['DT'].astype(int)
print(df)
Output:
DT Code DV GT
0 65 56 TD CAT 76.0
1 65 TR B54 NaN
2 45 52 T65 NaN
3 34 1 76 765 65.0