Search code examples
pythonnested-loopsbreaklogical-operatorscontinue

Function only prints value for four files from a directory of 5K files, Python


I am running a code which takes each row of a csv and finds the exact match of the entity inside each files of a directory. The problem here is that the code terminates after printing out matching values for four files, whereas there are 5K files in the directory. I think the issue is with my break or continue statement. Can someone please help me with this. Code till now:

import csv
import os
import re


path = 'C:\\Users\\Lenovo\\.spyder-py3\\5KFILES\\'

with open('C:\\Users\\Lenovo\\.spyder-py3\\codes_file.csv', newline='', encoding ='utf-8') as myFile:
    reader = csv.reader(myFile)
    for filenames in os.listdir(path):
        with open(os.path.join(path, filenames), encoding = 'utf-8') as my:
            content = my.read().lower()
            #print(content)
            for row in reader:
                if len(row[1])>=4:

                #v = re.search(r'(?<!\w){}(?!\w)'.format(re.escape(row[1])), content, re.I)
                    v = re.search(r'\b' + re.escape(row[1]) + r'\b', content, re.IGNORECASE)
                    if v: 
                        print(filenames,v.group(0))
                        break

Solution

  • reader is created before your for loop, and it is an iterator. Every time you reach the for line, iteration will continue where it had stopped. Once you reach the end of reader, the next for loops will be empty loops.

    You can see what happens in this short example:

    l = [0, 1, 2, 3, 4, 5]
    iterator = iter(l)
    
    for i in range(0, 16, 2):
        print('i:', i, "- starting the 'for j ...' loop")
        for j in iterator:
            print('iterator:', j)
            if j == i:
                break
    
    i: 0 - starting the 'for j ...' loop
    iterator: 0
    i: 2 - starting the 'for j ...' loop
    iterator: 1
    iterator: 2
    i: 4 - starting the 'for j ...' loop
    iterator: 3
    iterator: 4
    i: 6 starting the 'for j ...' loop
    iterator: 5
    i: 8 starting the 'for j ...' loop
    i: 10 starting the 'for j ...' loop
    i: 12 starting the 'for j ...' loop
    i: 14 starting the 'for j ...' loop
    

    Each time the for loop executes, it continues to iterate on iterator where it had stopped before. Once the iterator is exhausted, the for j... loops are empty.

    You should restart it on each loop:

    for row in csv.reader(myFile):
        ....
    

    or make a list:

    reader = list(csv.reader(myFile))
    
    ....
    
    for row in reader:
        ....