I have two different csv dataframe files Test1.csv
and Test2.csv
, I would like to match the 'id' column in Test1.csv
to 'id' column in Test2.csv
, and append missing 'id' values into Test2.csv
and update missing values.
Shortly Append False values, and update True values in Test2.csv
Test1.csv
No Price(Op) Price(Cl) id
1 1200 500 a01
2 1400 500 a02
3 1500 600 a03
4 1800 500 a04
5 1000 500 a05
6 1570 800 a06
7 1290 500 a07
8 1357 570 a08
Test2.csv
No Price(Op) Price(Cl) id
1 1200 500 a01
2 1500 500 a03
3 1450 500 a02
4 1800 500 a04
5 1200 500 a05
Desired Output:
Test2.csv
No Price(Op) Price(Cl) id
1 1200 500 a01
2 1500 600 a03
3 1400 500 a02
4 1800 500 a04
5 1000 500 a05
6 1570 800 a06
7 1290 500 a07
8 1357 570 a08
I tried To Loop Over with if statement
ds = pd.read_csv('Test1.csv')
df = pd.read_csv('Test2.csv')
for index, row in ds.iterrows():
if row['id'] in df['id']:
df['Price(Op)'].iloc[idx] = val['Price(Op)']
df['Price(Cl)'].iloc[idx] = val['Price(Cl)']
#What If index Are Different, How Program will know index of same id on other file
else:
df.append(ds.iloc[idx]
Slightly different approach.
No
and id
columns with all values from ds
using right merge so that all values from ds
are used, but order is kept from df
.df
order (No
) with ds
s order (No_ds
) using combine_firstNo_ds
column, sort by No
and reorder columns to desired order.import pandas as pd
ds = pd.DataFrame({'No': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5,
5: 6, 6: 7, 7: 8},
'Price(Op)': {0: 1200, 1: 1400, 2: 1500,
3: 1800, 4: 1000, 5: 1570,
6: 1290, 7: 1357},
'Price(Cl)': {0: 500, 1: 500, 2: 600,
3: 500, 4: 500, 5: 800,
6: 500, 7: 570},
'id': {0: 'a01', 1: 'a02', 2: 'a03',
3: 'a04', 4: 'a05', 5: 'a06',
6: 'a07', 7: 'a08'}})
df = pd.DataFrame({'No': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
'Price(Op)': {0: 1200, 1: 1500, 2: 1450,
3: 1800, 4: 1200},
'Price(Cl)': {0: 500, 1: 500, 2: 500,
3: 500, 4: 500},
'id': {0: 'a01', 1: 'a03', 2: 'a02',
3: 'a04', 4: 'a05'}})
# Merge df with ds
new_df = df[['No', 'id']] \
.merge(ds,
on='id',
how='right',
suffixes=['', '_ds'])
# Merge No Columns Together
new_df['No'] = new_df['No'].combine_first(new_df['No_ds']).astype(int)
# Drop Dup No Column, Sort and Reorder Columns
new_df = new_df \
.drop(columns=['No_ds']) \
.sort_values('No')[['No', 'Price(Op)', 'Price(Cl)', 'id']]
print(new_df.to_string())
Output:
No Price(Op) Price(Cl) id
0 1 1200 500 a01
2 2 1500 600 a03
1 3 1400 500 a02
3 4 1800 500 a04
4 5 1000 500 a05
5 6 1570 800 a06
6 7 1290 500 a07
7 8 1357 570 a08