Search code examples
pythonpython-3.xlistcsvfile-read

Python read from one CSV file search the corresponding row from another csv if there is a element match


I want to read through the first file1.csv and if the policy is present in file2.csv get the specific ID for policy and get the count of that policy ID from file3.csv. So I have 3 csv files file1.csv file2.csv file3.csv as shown below, which has thousands of similar rows

file2.csv
Name   Policies
Raj    12345, 676, 909
Sam    786
Lucy   899, 7676, 09

file2.csv
Policies       ID
676, 8787      212
909,898,707    342
89, 98,09      345

file3.csv
ID  Count
212 56
342 23
345 07

SO finally my final output would look something like this stored in a file or csv. Can use panda, numpy or anything

Final.csv
Name  tuple of [Policies, ID, Count]
Raj     [676,212,56]
Raj     [909, 342, 23]
Lucy    [09, 345, 07]

I'm stuck with the below code:

policyid = csv.reader( 'file2.csv', delimiter=',')
with open('file1.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        data = row['Policies'].split(",")
        if data:
            for policy in data:
                for policy, id in policyid:
                    data2 = policy.split(",")
                        if policy in data2:
                            print id

Solution

  • One way to do this is to read in all three CSV files, grab a value from file1, and scan through file2 and file3 to obtain those values. This is a extra difficult since comma-delimited lists in a field are an antipattern, forcing us to do some extra work to parse text along the way.

    Another way to do this would be to load all three CSV files into either a SQL table or dataframe and do some JOINs, but the comma-delimited lists still make this difficult.

    Here's an example of what I'm describing, although this is admittedly messy:

    import csv
    
    with open('file1.csv') as f:
        reader = csv.DictReader(f)
        next(reader)  # Skip header
        file1 = [row for row in reader]
    with open('file2.csv') as f:
        reader = csv.DictReader(f)
        next(reader)  # Skip header
        file2 = [row for row in reader]
    with open('file3.csv') as f:
        reader = csv.DictReader(f)
        next(reader)  # Skip header
        file3 = [row for row in reader]
    
    
    def get_policy_id(policy):
        for line in file2:
            policies = line['policies'].split(', ')
            if policy in policies:
                return line['ID']
    
    
    def get_id_count(id):
        for line in file3:
            if id == line['id']:
                return line['count']
    
    
    output = []
    for line in file1:
        policies = line['policies'].split(', ')
        for policy in policies:
            id = get_policy_id(policy)
            count = get_id_count(id)
            output.append({'name': line['name'],
                           'policy': policy,
                           'id': id,
                           'count': count})