Search code examples
pythonfilepython-2.7python-itertools

take next n lines from a file until EOF reached


I have a function, which yields specific columns from a csv file as a list and appends them to a list until a limit of n is reached. The problem is...

LIMIT = 10
def read_csv(filename):
    with open(filename, 'r') as infile:
         header = next(infile)
         for line in infile:
             # get column by header and append to mylist
             yield mylist
new_list = []
for dataset in read_csv('some.csv'):
    new_list.append(dataset)
    if len(new_list) == LIMIT:
        # call a func to create xml file with dataset

# grab the remaining data
else:
    new_list.append(dataset)
    # call a func to create xml file with dataset
    new_list = []

...this (ugly) for/else workaround. I've read about itertools.islice and itertools.takewhile How would you write this task w/o using a for/else?

for dataset in itertools.islice(read_csv('some.csv'), LIMIT):
    new_list.append(dataset)

I'm stuck here, because i have to find a way to capture islices StopIteration and repeat it until read_csv() is done

Any ideads?


Solution

  • A for-loop over islice won't raise StopIteration, so no need to worry about that and islice takes care of EOF as well. So, at the end of the loop you can simply call a func to create xml file with data. And instead of looping over islice I'd suggest you to simply call list() on it to get its data in a list.

    data = read_csv('some.csv')
    new_list = list(islice(data, LIMIT))
    # call a func to create xml file with data
    # do something with remaining `data`
    

    Or if you want to break the data from read_csv in chunks of size LIMIT then you can use the grouper recipe from itertools:

    from itertools import islice, izip_longest
    
    def grouper(iterable, n, fillvalue=None):
        args = [iter(iterable)] * n
        return izip_longest(fillvalue='', *args)
    
    for dataset in grouper(read_csv('some.csv'), LIMIT):
        # call a func to create xml file with dataset
    

    Note that if the number of items returned by read_csv are not an exact multiple of LIMIT then the last dataset will contain the '' fill value.