I have 14 pages csv document (actually PDF but converted to csv file). Each page has 36 lines - total lines are 544. Between the line I need to ignore lines 325-347 & 493-507 (which is correct with - if the page has CO= & PAGE only condition).
Totals Lines - 544
Page Indexes - [0, 36, 72, 108, 144, 180, 216, 252, 288, 348, 384, 420, 456, 508]
[[0, 35], [36, 71], [72, 107], [108, 143], [144, 179], [180, 215], [216, 251], [252, 287], [288, 347], [348, 383], [384, 419], [420, 455], [456, 507], [508, 544]]
With the code below I am getting "IndexError: list index out of range". Any help please, Thanks in Advance.
file = ('file name')
with open(file,'r') as fp:
csv_reader = list(csv.reader(fp))
num_rows = len(csv_reader)
print("Totals Lines - " +str(num_rows))
page_indexes = [ i for i in range(num_rows) if (('PAGE' in csv_reader[i]) and (csv_reader[i][0].strip() == 'CO='))]
print("Page Indexes - " + str(page_indexes))
page_nums = [ [ page_indexes[i], page_indexes[i+1]-1 ] for i in range(len(page_indexes))]
print(page_nums)
On the last iteration of the list comprehension for page_nums
, there's no page_indexes[i+1]
, since i
is the index of the last element. You need to stop before that. Then you can add the last page index separately.
page_nums = [ [ page_indexes[i], page_indexes[i+1]-1 ] for i in range(len(page_indexes)-1)]
page_nums.append([page_indexes[-1], num_rows-1])