This may be a simple/repeat question, but I could find/figure out yet how to do it.
I have two csv files:
info.csv:
"Last Name", First Name, ID, phone, adress, age X [Total age: 100] |009076
abc, xyz, 1234, 982-128-0000, pqt,
bcd, uvw, 3124, 813-222-1111, tre,
poi, ccc, 9087, 123-45607890, weq,
and then
age.csv:
student_id,age_1
3124,20
9087,21
1234,45
I want to compare the two csv files, based on the columns "id
" from info.csv and "student_id
" from age.csv and take the corresponding "age_1
" data and put it into the "age
" column in info.csv.
So the final output should be:
info.csv:
"Last Name", First Name, ID, phone, adress, age X [Total age: 100] |009076
abc, xyz, 1234, 982-128-0000, pqt,45
bcd, uvw, 3124, 813-222-1111, tre,20
poi, ccc, 9087, 123-45607890, weq,21
I am able to simply join the tables based on the keys into a new.csv, but can't put the data in the columns titles "age
". I used "csvkit
" to do that.
Here is what I used:
csvjoin -c 3,1 info.csv age.csv > new.csv
Try this...
import csv
info = list(csv.reader(open("info.csv", 'rb')))
age = list(csv.reader(open("age.csv", 'rb')))
def copyCSV(age, info, outFileName = 'out.csv'):
# put age into dict, indexed by ID
# assumes no duplicate entries
# 1 - build a dict ageDict to represent data
ageDict = dict([(entry[0].replace(' ',''), entry[1]) for entry in age[1:] if entry != []])
# 2 - setup output
with open(outFileName, 'wb') as outFile:
outwriter = csv.writer(outFile)
# 3 - run through info and slot in ages and write to output
# nb: had to use .replace(' ','') to strip out whitespaces - these may not be in original .csv
outwriter.writerow(info[0])
for entry in info[1:]:
if entry != []:
key = entry[2].replace(' ','')
if key in ageDict: # checks that you have data from age.csv
entry[5] = ageDict[key]
outwriter.writerow(entry)
copyCSV(age, info)
Let me know if it works or if anything is unclear. I've used a dict because it should be faster if your files are massive, as you only have to loop through the data in age.csv once.
There may be a simpler way / something already implemented...but this should do the trick.