I have two csv files, csv1 is the working file that changes, csv2 is static and used as a reference set.
csv1 - input.csv
Name | Type | Owner | Status | Date | Group |
---|---|---|---|---|---|
bananas | fruit | Joe | In Stock | 1/1/23 | |
apples | fruit | jim | Out Of Stock | 1/2/23 | |
tomato | veggie | bob | In Stock | 1/3/23 | |
potato | vegie | tom | Out Of Stock | 1/4/23 | |
kiwi | fruit | jane | In Stock | 1/5/23 | |
chicken | meat | francis | Out Of Stock | 1/6/23 | |
beef | meat | linda | In Stock | 1/7/23 |
csv2 - reference.csv
Type | Group |
---|---|
fruit | 1 |
veggie | 2 |
meat | 3 |
I found this post here https://stackoverflow.com/a/14257599 which helped me get started but it's the processing that doesn't seem to be working
I'm using this code:
with open("input.csv", "r") as csv_input, open("reference.csv", "r")as assign_csv, open("output.csv", "w") as out_file:
reader = csv.reader(csv_input)
reader2 = csv.reader(assign_csv)
writer = csv.writer(out_file)
for error in reader:
writer.writerow(error)
for group in reader2:
if group[0] in error[1]:
error[5] = group[1]
writer.writerow(error)
This reads the input and reference files just fine but at the very bottom in the if statement it's not doing anything and I'm not sure why. Basically I want it to loop through every row in the input.csv and check the value in the Type column and then loop through the reference.csv and if the text is contained there, then write to the Group column in the output.csv.
Currently the code is essentially just replicating input.csv to output.csv without writing anything to the cells for the Group column. I know the loop logic is correct because I tried it with a separate code sample and it worked just fine so I think my problem is the if statement and where I have the writer.writerow(error) line placed.
This kind of work must be done in separate passes of each file: read the reference file first and store the lookup values, then read the other CSV and use the lookup structure.
Python's dict is perfect for holding the lookup keys and values:
lookup: dict[str, str] = {}
with open("input1.csv", newline="") as f:
reader = csv.reader(f)
next(reader) # discard header
for row in reader:
lookup[row[0]] = row[1]
The lookup dict looks like:
{
"fruit": "1",
"veggie": "2",
"meat": "3",
}
With the lookup ready, read the other CSV and check each row's Type to see if it's in lookup. I like to skip rows that don't qualify instead of adding nested if-statements:
new_rows: list[list[str]] = []
with open("input2.csv", newline="") as f:
reader = csv.reader(f)
new_rows.append(next(reader)) # append header
for row in reader:
group = lookup.get(row[1])
if group == None:
continue
row[5] = group
new_rows.append(row)
Then write the final file:
with open("output.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(new_rows)
| Name | Type | Owner | Status | Date | Group |
|---------|--------|---------|--------------|--------|-------|
| bananas | fruit | joe | In Stock | 1/1/23 | 1 |
| apples | fruit | Jim | Out Of Stock | 1/2/23 | 1 |
| tomato | veggie | bob | In Stock | 1/3/23 | 2 |
| kiwi | fruit | jane | In Stock | 1/5/23 | 1 |
| chicken | meat | francis | Out Of Stock | 1/6/23 | 3 |
| beef | meat | linda | In Stock | 1/7/23 | 3 |