Search code examples
pythonfor-loopfile-ioreadlines

Getting values from file if match found for lines with same id in Python


I have a file with lines of data. Each line starts with an id, followed by fixed set of attributes separated by comma.

123,2,kent,...,
123,2,bob,...,
123,2,sarah,...,
123,8,may,...,

154,4,sheila,...,
154,4,jeff,...,

175,3,bob,...,

249,2,jack,...,
249,5,bob,...,
249,3,rose,...,

I would like to get an attribute if the conditions are met. The conditions are if 'bob' appears within the same id, get the value of the second attribute that follows.

For example:

id: 123
values returned: 2, 8

id: 249
values returned: 3

Java has a double loop I can use, but I would like to try this in Python. Any suggestions would be great.


Solution

  • I came up with a (perhaps) more pythonic solution which uses groupby and dropwhile. This method yields the same result as the below method, but I think it's prettier.. :) Flags, "curr_id" and stuff like that is not very pythonic, and should be avoided if possible!

    import csv
    from itertools import groupby, dropwhile
    
    goal = 'bob'
    ids = {}
    
    with open('my_data.csv') as ifile:
        reader = csv.reader(ifile)
        for key, rows in groupby(reader, key=lambda r: r[0]):
            matched_rows = list(dropwhile(lambda r: r[2] != goal, rows))
            if len(matched_rows) > 1:
                ids[key] = [row[1] for row in matched_rows[1:]]
    
    print ids
    

    (first solution below)

    from collections import defaultdict
    import csv
    
    curr_id = None
    found = False
    goal = 'bob'
    ids = defaultdict(list)
    
    with open('my_data.csv') as ifile:
        for row in csv.reader(ifile):
            if row[0] != curr_id:
                found = False
                curr_id = row[0]
            if found:
                ids[curr_id].append(row[1])
            elif row[2] == goal:
                found = True
    
    print dict(ids)
    

    Output:

    {'123': ['2', '8'], '249': ['3']}