Search code examples
pythoncsvcontextmanager

Multiple list comprehensions return empty list with context manager


I'm reading in a zipped csv file and would like to extract only specific columns without using pandas. My current code only returns a list for the first list comprehension, but not for the following ones. How can I extract multiple columns while using a context manager?

Input file:

col1,col2,col3
1,2,3
a,b,c

My code

import gzip
import csv
import codecs

with gzip.open(r"myfile.csv.gz", "r") as f:
    content = csv.reader(codecs.iterdecode(f, "utf-8"))

    col_2 = [row[1] for row in content] # Returns [2, "b"]
    col_3 = [row[2] for row in content] # Returns []

Expected output:

col_2: [2, "b"]
col_3: [3, "c"]

Solution

  • The issue is not due to the context manager but to the generator that can only be read once.

    You can duplicate it using itertools.tee:

    import gzip
    import csv
    import codecs
    
    with gzip.open(r"myfile.csv.gz", "r") as f:
        content = csv.reader(codecs.iterdecode(f, "utf-8"))
        from itertools import tee
    
        c1, c2 = tee(content) # from now on, do not use content anymore
        
        col_2 = [row[1] for row in c1]
        col_3 = [row[2] for row in c2]
    

    output:

    >>> col_2
    ['col2', '2', 'b']
    
    >>> col_3
    ['col3', '3', 'c']
    

    using a classical loop

    A better method however would be to use a classical loop. This avoids having to loop over the values twice:

    with gzip.open(r"myfile.csv.gz", "r") as f:
        content = csv.reader(codecs.iterdecode(f, "utf-8"))
    
        col_2 = []
        col_3 = []
        for row in content:
            col_2.append(row[1])
            col_3.append(row[2])