I am running a code which takes each row of a csv and finds the exact match of the entity inside each files of a directory. The problem here is that the code terminates after printing out matching values for four files, whereas there are 5K files in the directory. I think the issue is with my break or continue statement. Can someone please help me with this. Code till now:
import csv
import os
import re
path = 'C:\\Users\\Lenovo\\.spyder-py3\\5KFILES\\'
with open('C:\\Users\\Lenovo\\.spyder-py3\\codes_file.csv', newline='', encoding ='utf-8') as myFile:
reader = csv.reader(myFile)
for filenames in os.listdir(path):
with open(os.path.join(path, filenames), encoding = 'utf-8') as my:
content = my.read().lower()
#print(content)
for row in reader:
if len(row[1])>=4:
#v = re.search(r'(?<!\w){}(?!\w)'.format(re.escape(row[1])), content, re.I)
v = re.search(r'\b' + re.escape(row[1]) + r'\b', content, re.IGNORECASE)
if v:
print(filenames,v.group(0))
break
reader
is created before your for
loop, and it is an iterator. Every time you reach the for
line, iteration will continue where it had stopped. Once you reach the end of reader
, the next for
loops will be empty loops.
You can see what happens in this short example:
l = [0, 1, 2, 3, 4, 5]
iterator = iter(l)
for i in range(0, 16, 2):
print('i:', i, "- starting the 'for j ...' loop")
for j in iterator:
print('iterator:', j)
if j == i:
break
i: 0 - starting the 'for j ...' loop
iterator: 0
i: 2 - starting the 'for j ...' loop
iterator: 1
iterator: 2
i: 4 - starting the 'for j ...' loop
iterator: 3
iterator: 4
i: 6 starting the 'for j ...' loop
iterator: 5
i: 8 starting the 'for j ...' loop
i: 10 starting the 'for j ...' loop
i: 12 starting the 'for j ...' loop
i: 14 starting the 'for j ...' loop
Each time the for
loop executes, it continues to iterate on iterator
where it had stopped before. Once the iterator is exhausted, the for j...
loops are empty.
You should restart it on each loop:
for row in csv.reader(myFile):
....
or make a list:
reader = list(csv.reader(myFile))
....
for row in reader:
....