Search code examples
pythoncsvreference

Python Read from csv1 and reference csv2 to write to csv3


I have two csv files, csv1 is the working file that changes, csv2 is static and used as a reference set.

csv1 - input.csv

Name Type Owner Status Date Group
bananas fruit Joe In Stock 1/1/23
apples fruit jim Out Of Stock 1/2/23
tomato veggie bob In Stock 1/3/23
potato vegie tom Out Of Stock 1/4/23
kiwi fruit jane In Stock 1/5/23
chicken meat francis Out Of Stock 1/6/23
beef meat linda In Stock 1/7/23

csv2 - reference.csv

Type Group
fruit 1
veggie 2
meat 3

I found this post here https://stackoverflow.com/a/14257599 which helped me get started but it's the processing that doesn't seem to be working

I'm using this code:

with open("input.csv", "r") as csv_input, open("reference.csv", "r")as assign_csv, open("output.csv", "w") as out_file:
reader = csv.reader(csv_input)
reader2 = csv.reader(assign_csv)
writer = csv.writer(out_file)

for error in reader:
    writer.writerow(error)
    for group in reader2:
        if group[0] in error[1]:
            error[5] = group[1]
            writer.writerow(error)

This reads the input and reference files just fine but at the very bottom in the if statement it's not doing anything and I'm not sure why. Basically I want it to loop through every row in the input.csv and check the value in the Type column and then loop through the reference.csv and if the text is contained there, then write to the Group column in the output.csv.

Currently the code is essentially just replicating input.csv to output.csv without writing anything to the cells for the Group column. I know the loop logic is correct because I tried it with a separate code sample and it worked just fine so I think my problem is the if statement and where I have the writer.writerow(error) line placed.


Solution

  • This kind of work must be done in separate passes of each file: read the reference file first and store the lookup values, then read the other CSV and use the lookup structure.

    Python's dict is perfect for holding the lookup keys and values:

    lookup: dict[str, str] = {}
    with open("input1.csv", newline="") as f:
        reader = csv.reader(f)
        next(reader)  # discard header
    
        for row in reader:
            lookup[row[0]] = row[1]
    

    The lookup dict looks like:

    {
        "fruit": "1",
        "veggie": "2",
        "meat": "3",
    }
    

    With the lookup ready, read the other CSV and check each row's Type to see if it's in lookup. I like to skip rows that don't qualify instead of adding nested if-statements:

    new_rows: list[list[str]] = []
    with open("input2.csv", newline="") as f:
        reader = csv.reader(f)
        new_rows.append(next(reader))  # append header
    
        for row in reader:
            group = lookup.get(row[1])
            if group == None:
                continue
    
            row[5] = group
            new_rows.append(row)
    

    Then write the final file:

    with open("output.csv", "w", newline="") as f:
        writer = csv.writer(f)
        writer.writerows(new_rows)
    
    | Name    | Type   | Owner   | Status       | Date   | Group |
    |---------|--------|---------|--------------|--------|-------|
    | bananas | fruit  | joe     | In Stock     | 1/1/23 | 1     |
    | apples  | fruit  | Jim     | Out Of Stock | 1/2/23 | 1     |
    | tomato  | veggie | bob     | In Stock     | 1/3/23 | 2     |
    | kiwi    | fruit  | jane    | In Stock     | 1/5/23 | 1     |
    | chicken | meat   | francis | Out Of Stock | 1/6/23 | 3     |
    | beef    | meat   | linda   | In Stock     | 1/7/23 | 3     |