I am processing a shell script in Python. My first step is to comb through the file and save only the important lines in a list (of strings). However, I have isolated a problem where every second line is ignored. Why is the second, fourth, etc. line skipped in the following code?
f = open("sample_failing_file.txt", encoding="ISO-8859-1")
readfile = f.read()
filelines = readfile.split("\n")
def remove_irrelevant_lines(filecontent: list[str]) -> list[str]:
for line in filecontent:
if drop_line_appropriate(line):
filecontent.remove(line)
return filecontent
def drop_line_appropriate(line: str) -> bool:
if line.startswith("#"):
return True
# some more conditions, omitted here
return False
filelines = remove_irrelevant_lines(filelines)
f.close()
When I run this code, I can see filecontent is complete. However, when I look at line, I can see e.g. some line 3 is never read. Here is a simplified version of the shell script, on which my Python script fails (sample_failing_file.txt)
#!/bin/sh
#
# some line 1
#
# some line 2
# some line 3
As was pointed out in the comments, you shouldn't try to remove elements from a list while iterating over it. Additionally, when removing lines, don't want to use list.remove()
, since that causes it to search for the line, which will make it run vastly slower than it should.
The following should fix your problem and also run vastly faster:
def remove_irrelevant_lines(filecontent: list[str]) -> list[str]:
return [line for line in filecontent if not drop_line_appropriate(line)]
This creates and returns a new list, filtering out the lines indicated by drop_line_appropriate
.