Search code examples
pythonmergepython-2.7csv-import

Merge CSV files in Python using Python Dictionary


Hi I am trying to create a new CSV file from merge of specific fields in two CSV files based on a common column or primary key. I have tried doing the same thing in powershell and it worked but was very slow in completing the process - more than 30 minutes for merging 5000+ line files so trying this in Python. I am new so please go easy on me.

So two files are infile.csv and checkfile.csv and the columns in the output file created would be based on columns in infile.csv. The code checks the values in checkfile.csv, creates outfile.csv, copies columns from infile.csv and needs to rewrite values for two fields based on corresponding values in checkfile.com. Following are the details

infile.csv -

"StockNumber","SKU","ChannelProfileID","CostPrice"
"10m_s-vid#APTIIAMZ","2VV-10",3746,0.33
"10m_s-vid#CSE","2VV-10",3746,0.98
"1RR-01#CSE","1RR-01",3746
"1RR-01#PCAWS","1RR-01",3746,
"1m_s-vid_ext#APTIIAMZ","2VV-101",3746,0.42

checkfile.csv

ProductCode, Description, Supplier, CostPrice, RRPPrice, Stock, Manufacturer, SupplierProductCode, ManuCode, LeadTime
2VV-03,3MTR BLACK SVHS M - M GOLD CABLE - B/Q 100,Cables Direct Ltd,0.43,,930,CDL,2VV-03,2VV-03,1
2VV-05,5MTR BLACK SVHS M - M GOLD CABLE - B/Q 100,Cables Direct Ltd,0.54,,1935,CDL,2VV-05,2VV-05,1
2VV-10,10MTR BLACK SVHS M - M GOLD CABLE - B/Q 50,Cables Direct Ltd,0.86,,1991,CDL,2VV-10,2VV-10,1

The outfile.csv I am getting is -

StockNumber,SKU,ChannelProfileID,CostPrice
10m_s-vid#APTIIAMZ,2VV-10,"(' ',)", 
10m_s-vid#CSE,2VV-10,"(' ',)", 
1RR-01#CSE,1RR-01,"(' ',)", 
1RR-01#PCAWS,1RR-01,"(' ',)", 
1m_s-vid_ext#APTIIAMZ,2VV-101,"(' ',)", 

But the outfile.csv I need is -

StockNumber,SKU,ChannelProfileID,CostPrice
10m_s-vid#APTIIAMZ,2VV-10,1991,0.86  
10m_s-vid#CSE,2VV-10,1991,0.86   
1RR-01#CSE,1RR-01
1RR-01#PCAWS,1RR-01          
1m_s-vid_ext#APTIIAMZ,2VV-101

Finally the code -

import csv

with open('checkfile.csv', 'rb') as checkfile:
    checkreader = csv.DictReader(checkfile)

    product_result = dict(
        ((v['ProductCode'], v[' Stock']), (v['ProductCode'], v[' CostPrice']))  for v in checkreader
    )

with open('infile.csv', 'rb') as infile:
    with open('outfile.csv', 'wb') as outfile:
        reader = csv.DictReader(infile)

        writer = csv.DictWriter(outfile, reader.fieldnames)
        writer.writeheader()

        for item in reader:
            result = product_result.get(item['SKU'], " ")

            item['ChannelProfileID'] = result,
            item['CostPrice'] = result

            writer.writerow(item)

Solution

  • You could make it a little simpler:

    import csv
    
    with open('checkfile.csv', 'rb') as checkfile:
        product_result = {
            record['ProductCode']: record for record in csv.DictReader(checkfile)}
    
    with open('infile.csv', 'rb') as infile:
        with open('outfile.csv', 'wb') as outfile:
            reader = csv.DictReader(infile)
            writer = csv.DictWriter(outfile, reader.fieldnames)
            writer.writeheader()
            for item in reader:
                record = product_result.get(item['SKU'], None)
                if record:
                    item['ChannelProfileID'] = record[' Stock']  # ???
                    item['CostPrice'] = record[' CostPrice']
                else:
                    item['ChannelProfileID'] = None
                    item['CostPrice'] = None
                writer.writerow(item)
    

    I wasn't sure about the line which I commented with ???.

    Also, if you really want to produce broken CSV, please feel free to omit the else-clause.

    I tested it with StringIO objects. It produced the result you specified, but with trailing commas, where there was no match in checkfile.

    And I used Python 2.7 dict comprehension, since you tagged your question with python-2.7.