I have 2 CSV that look like this:
input.csv
:
id | sku | costs_production |
---|---|---|
12 | h01 | |
13 | h02 |
search.csv
:
sku | costs_production |
---|---|
h01 | 14.95 |
h02 | 51.99 |
I want to use the value (sku
) from search.csv
to search for a match in input.csv
and, if it finds a match, it should add the corresponding value from cost_production
to the cost_production
column in input.csv
and output it in a new file.
Here's what I've got so far:
import csv
ID, SKU, COSTS_PRODUCTION = 'id', 'sku', 'costs_production' # Field names referenced.
# Read entire input file into a list.
with open('input.csv', 'r', newline='') as inp:
reader = csv.DictReader(inp)
inputs = list(reader)
# Update input rows that match data in search.csv file.
with open('search.csv', 'r', newline='') as sea:
sea_reader = csv.DictReader(sea)
for row in sea_reader:
SKU, COSTS_PRODUCTION = row[SKU], row[COSTS_PRODUCTION]
for input_ in inputs:
if input_[SKU] == SKU: # Match?
input_[COSTS_PRODUCTION] = row[COSTS_PRODUCTION]
break
# Write updated input.csv data out into a file.
with open('input_updated.csv', 'w', newline='') as outp:
fieldnames = inputs[0].keys()
writer = csv.DictWriter(outp, fieldnames)
writer.writeheader()
writer.writerows(inputs)
print('done')
I keep getting KeyError: 'h01'
.
It would be easier if you consider using dataframe
in pandas
.
For example:
import pandas as pd
input_df = pd.read_csv('input.csv')
search_df = pd.read_csv('search.csv')
merged_df = pd.merge(input_df, search_df, on='sku', how='left')
# Fill the costs_production column with the corresponding values from search.csv
merged_df['costs_production'] = merged_df['costs_production_y'].fillna(merged_df['costs_production_x'])
merged_df = merged_df[['id', 'sku', 'costs_production']]
merged_df.to_csv('input_updated.csv', index=False)
However, If you still consider using csv
, the issue occurs because the variables holding the column keys (SKU
and COSTS_PRODUCTION
) and the variables holding the actual values from the CSV rows share the same names. Specifically, the following line causes the problem:
SKU, COSTS_PRODUCTION = row[SKU], row[COSTS_PRODUCTION]
To resolve this, assign the CSV values to uniquely named variables, like so:
# Update input rows that match data in search.csv file.
with open('search.csv', 'r', newline='') as sea:
sea_reader = csv.DictReader(sea)
for row in sea_reader:
search_sku = row[SKU]
search_costs_production = row[COSTS_PRODUCTION]
for input_ in inputs:
if input_[SKU] == search_sku:
input_[COSTS_PRODUCTION] = search_costs_production
break