Search code examples
pythoncsv

Search in CSV and add value if found correct value in another CSV


I have 2 CSV that look like this:

input.csv:

id sku costs_production
12 h01
13 h02

search.csv:

sku costs_production
h01 14.95
h02 51.99

I want to use the value (sku) from search.csv to search for a match in input.csv and, if it finds a match, it should add the corresponding value from cost_production to the cost_production column in input.csv and output it in a new file.

Here's what I've got so far:

    import csv

    ID, SKU, COSTS_PRODUCTION = 'id', 'sku', 'costs_production' # Field names referenced.

    # Read entire input file into a list.
    with open('input.csv', 'r', newline='') as inp:
        reader = csv.DictReader(inp)
        inputs = list(reader)

    # Update input rows that match data in search.csv file.
    with open('search.csv', 'r', newline='') as sea:
        sea_reader = csv.DictReader(sea)
        for row in sea_reader:
            SKU, COSTS_PRODUCTION = row[SKU], row[COSTS_PRODUCTION]
            for input_ in inputs:
                if input_[SKU] == SKU: # Match?
                    input_[COSTS_PRODUCTION] = row[COSTS_PRODUCTION]
                    break

    # Write updated input.csv data out into a file.
    with open('input_updated.csv', 'w', newline='') as outp:
        fieldnames = inputs[0].keys()
        writer = csv.DictWriter(outp, fieldnames)
        writer.writeheader()
        writer.writerows(inputs)

    print('done')

I keep getting KeyError: 'h01'.


Solution

  • It would be easier if you consider using dataframe in pandas.

    For example:

    import pandas as pd
    
    input_df = pd.read_csv('input.csv')
    search_df = pd.read_csv('search.csv')
    
    merged_df = pd.merge(input_df, search_df, on='sku', how='left')
    
    # Fill the costs_production column with the corresponding values from search.csv
    merged_df['costs_production'] = merged_df['costs_production_y'].fillna(merged_df['costs_production_x'])
    
    merged_df = merged_df[['id', 'sku', 'costs_production']]
    merged_df.to_csv('input_updated.csv', index=False)
    

    However, If you still consider using csv, the issue occurs because the variables holding the column keys (SKU and COSTS_PRODUCTION) and the variables holding the actual values from the CSV rows share the same names. Specifically, the following line causes the problem:

    SKU, COSTS_PRODUCTION = row[SKU], row[COSTS_PRODUCTION]
    

    To resolve this, assign the CSV values to uniquely named variables, like so:

    # Update input rows that match data in search.csv file.
    with open('search.csv', 'r', newline='') as sea:
        sea_reader = csv.DictReader(sea)
        for row in sea_reader:
            search_sku = row[SKU]
            search_costs_production = row[COSTS_PRODUCTION]
            for input_ in inputs:
                if input_[SKU] == search_sku:
                    input_[COSTS_PRODUCTION] = search_costs_production
                    break