Hi I am trying to create a new CSV file from merge of specific fields in two CSV files based on a common column or primary key. I have tried doing the same thing in powershell and it worked but was very slow in completing the process - more than 30 minutes for merging 5000+ line files so trying this in Python. I am new so please go easy on me.
So two files are infile.csv and checkfile.csv and the columns in the output file created would be based on columns in infile.csv. The code checks the values in checkfile.csv, creates outfile.csv, copies columns from infile.csv and needs to rewrite values for two fields based on corresponding values in checkfile.com. Following are the details
infile.csv -
"StockNumber","SKU","ChannelProfileID","CostPrice"
"10m_s-vid#APTIIAMZ","2VV-10",3746,0.33
"10m_s-vid#CSE","2VV-10",3746,0.98
"1RR-01#CSE","1RR-01",3746
"1RR-01#PCAWS","1RR-01",3746,
"1m_s-vid_ext#APTIIAMZ","2VV-101",3746,0.42
checkfile.csv
ProductCode, Description, Supplier, CostPrice, RRPPrice, Stock, Manufacturer, SupplierProductCode, ManuCode, LeadTime
2VV-03,3MTR BLACK SVHS M - M GOLD CABLE - B/Q 100,Cables Direct Ltd,0.43,,930,CDL,2VV-03,2VV-03,1
2VV-05,5MTR BLACK SVHS M - M GOLD CABLE - B/Q 100,Cables Direct Ltd,0.54,,1935,CDL,2VV-05,2VV-05,1
2VV-10,10MTR BLACK SVHS M - M GOLD CABLE - B/Q 50,Cables Direct Ltd,0.86,,1991,CDL,2VV-10,2VV-10,1
The outfile.csv I am getting is -
StockNumber,SKU,ChannelProfileID,CostPrice
10m_s-vid#APTIIAMZ,2VV-10,"(' ',)",
10m_s-vid#CSE,2VV-10,"(' ',)",
1RR-01#CSE,1RR-01,"(' ',)",
1RR-01#PCAWS,1RR-01,"(' ',)",
1m_s-vid_ext#APTIIAMZ,2VV-101,"(' ',)",
But the outfile.csv I need is -
StockNumber,SKU,ChannelProfileID,CostPrice
10m_s-vid#APTIIAMZ,2VV-10,1991,0.86
10m_s-vid#CSE,2VV-10,1991,0.86
1RR-01#CSE,1RR-01
1RR-01#PCAWS,1RR-01
1m_s-vid_ext#APTIIAMZ,2VV-101
Finally the code -
import csv
with open('checkfile.csv', 'rb') as checkfile:
checkreader = csv.DictReader(checkfile)
product_result = dict(
((v['ProductCode'], v[' Stock']), (v['ProductCode'], v[' CostPrice'])) for v in checkreader
)
with open('infile.csv', 'rb') as infile:
with open('outfile.csv', 'wb') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, reader.fieldnames)
writer.writeheader()
for item in reader:
result = product_result.get(item['SKU'], " ")
item['ChannelProfileID'] = result,
item['CostPrice'] = result
writer.writerow(item)
You could make it a little simpler:
import csv
with open('checkfile.csv', 'rb') as checkfile:
product_result = {
record['ProductCode']: record for record in csv.DictReader(checkfile)}
with open('infile.csv', 'rb') as infile:
with open('outfile.csv', 'wb') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, reader.fieldnames)
writer.writeheader()
for item in reader:
record = product_result.get(item['SKU'], None)
if record:
item['ChannelProfileID'] = record[' Stock'] # ???
item['CostPrice'] = record[' CostPrice']
else:
item['ChannelProfileID'] = None
item['CostPrice'] = None
writer.writerow(item)
I wasn't sure about the line which I commented with ???
.
Also, if you really want to produce broken CSV, please feel free to omit the else-clause.
I tested it with StringIO objects. It produced the result you specified, but with trailing commas, where there was no match in checkfile.
And I used Python 2.7 dict comprehension, since you tagged your question with python-2.7.