I have a file with lines of data. Each line starts with an id, followed by fixed set of attributes separated by comma.
123,2,kent,...,
123,2,bob,...,
123,2,sarah,...,
123,8,may,...,
154,4,sheila,...,
154,4,jeff,...,
175,3,bob,...,
249,2,jack,...,
249,5,bob,...,
249,3,rose,...,
I would like to get an attribute if the conditions are met. The conditions are if 'bob' appears within the same id, get the value of the second attribute that follows.
For example:
id: 123
values returned: 2, 8
id: 249
values returned: 3
Java has a double loop I can use, but I would like to try this in Python. Any suggestions would be great.
I came up with a (perhaps) more pythonic solution which uses groupby
and dropwhile
. This method yields the same result as the below method, but I think it's prettier.. :) Flags, "curr_id" and stuff like that is not very pythonic, and should be avoided if possible!
import csv
from itertools import groupby, dropwhile
goal = 'bob'
ids = {}
with open('my_data.csv') as ifile:
reader = csv.reader(ifile)
for key, rows in groupby(reader, key=lambda r: r[0]):
matched_rows = list(dropwhile(lambda r: r[2] != goal, rows))
if len(matched_rows) > 1:
ids[key] = [row[1] for row in matched_rows[1:]]
print ids
(first solution below)
from collections import defaultdict
import csv
curr_id = None
found = False
goal = 'bob'
ids = defaultdict(list)
with open('my_data.csv') as ifile:
for row in csv.reader(ifile):
if row[0] != curr_id:
found = False
curr_id = row[0]
if found:
ids[curr_id].append(row[1])
elif row[2] == goal:
found = True
print dict(ids)
Output:
{'123': ['2', '8'], '249': ['3']}