I want to read through the first file1.csv and if the policy is present in file2.csv get the specific ID for policy and get the count of that policy ID from file3.csv. So I have 3 csv files file1.csv file2.csv file3.csv as shown below, which has thousands of similar rows
file2.csv
Name Policies
Raj 12345, 676, 909
Sam 786
Lucy 899, 7676, 09
file2.csv
Policies ID
676, 8787 212
909,898,707 342
89, 98,09 345
file3.csv
ID Count
212 56
342 23
345 07
SO finally my final output would look something like this stored in a file or csv. Can use panda, numpy or anything
Final.csv
Name tuple of [Policies, ID, Count]
Raj [676,212,56]
Raj [909, 342, 23]
Lucy [09, 345, 07]
I'm stuck with the below code:
policyid = csv.reader( 'file2.csv', delimiter=',')
with open('file1.csv', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
data = row['Policies'].split(",")
if data:
for policy in data:
for policy, id in policyid:
data2 = policy.split(",")
if policy in data2:
print id
One way to do this is to read in all three CSV files, grab a value from file1, and scan through file2 and file3 to obtain those values. This is a extra difficult since comma-delimited lists in a field are an antipattern, forcing us to do some extra work to parse text along the way.
Another way to do this would be to load all three CSV files into either a SQL table or dataframe and do some JOINs, but the comma-delimited lists still make this difficult.
Here's an example of what I'm describing, although this is admittedly messy:
import csv
with open('file1.csv') as f:
reader = csv.DictReader(f)
next(reader) # Skip header
file1 = [row for row in reader]
with open('file2.csv') as f:
reader = csv.DictReader(f)
next(reader) # Skip header
file2 = [row for row in reader]
with open('file3.csv') as f:
reader = csv.DictReader(f)
next(reader) # Skip header
file3 = [row for row in reader]
def get_policy_id(policy):
for line in file2:
policies = line['policies'].split(', ')
if policy in policies:
return line['ID']
def get_id_count(id):
for line in file3:
if id == line['id']:
return line['count']
output = []
for line in file1:
policies = line['policies'].split(', ')
for policy in policies:
id = get_policy_id(policy)
count = get_id_count(id)
output.append({'name': line['name'],
'policy': policy,
'id': id,
'count': count})