CSV joining based on keys

This may be a simple/repeat question, but I could find/figure out yet how to do it.

I have two csv files:

info.csv:

"Last Name", First Name, ID, phone, adress, age X [Total age: 100] |009076

abc, xyz, 1234, 982-128-0000, pqt,

bcd, uvw, 3124, 813-222-1111, tre, 

poi, ccc, 9087, 123-45607890, weq,

and then

age.csv:

student_id,age_1

3124,20

9087,21

1234,45

I want to compare the two csv files, based on the columns "id" from info.csv and "student_id" from age.csv and take the corresponding "age_1" data and put it into the "age" column in info.csv.

So the final output should be:

info.csv:

"Last Name", First Name, ID, phone, adress, age X [Total age: 100] |009076
 abc, xyz, 1234, 982-128-0000, pqt,45
 bcd, uvw, 3124, 813-222-1111, tre,20
 poi, ccc, 9087, 123-45607890, weq,21

I am able to simply join the tables based on the keys into a new.csv, but can't put the data in the columns titles "age". I used "csvkit" to do that.

Here is what I used:

csvjoin -c 3,1 info.csv age.csv > new.csv

Solution

Try this...

import csv

info = list(csv.reader(open("info.csv", 'rb')))
age = list(csv.reader(open("age.csv", 'rb')))

def copyCSV(age, info, outFileName = 'out.csv'):
    # put age into dict, indexed by ID
    # assumes no duplicate entries

    # 1 - build a dict ageDict to represent data
    ageDict = dict([(entry[0].replace(' ',''), entry[1]) for entry in age[1:] if entry != []])

    # 2 - setup output
    with open(outFileName, 'wb') as outFile:
        outwriter = csv.writer(outFile)
        # 3 - run through info and slot in ages and write to output
        # nb: had to use .replace(' ','') to strip out whitespaces - these may not be in original .csv
        outwriter.writerow(info[0])
        for entry in info[1:]:
            if entry != []:
                key = entry[2].replace(' ','')
                if key in ageDict: # checks that you have data from age.csv
                    entry[5] = ageDict[key]
            outwriter.writerow(entry)

copyCSV(age, info)

Let me know if it works or if anything is unclear. I've used a dict because it should be faster if your files are massive, as you only have to loop through the data in age.csv once.

There may be a simpler way / something already implemented...but this should do the trick.