I have input files "input.dat" contain some values like this :
41611 2014 12 18 0 0
41615 2014 12 18 0 0
41625 2014 12 18 0 0
41640 2014 6 14 3 3
42248 2014 12 18 0 0
42323 2014 12 18 0 0
42330 2014 8 13 7 7
42334 2014 12 18 0 0
42335 2014 12 18 0 0
...
I have many dataset files but seems so many unwanted data How to delete many rows for this case 41640 and 42330 and its entire row values at instant. For now I used this script:
with open(path+fname,"r") as input:
with open("00-new.dat","wb") as output:
for line in input:
if line!="41640"+"\n":
output.write(line)
The result: The data 41640 is still exist in output. Any ideas??
You need to change your condition - how it is now it checks if the whole line is equal to 41640
. Each line
is instead equal to the whole row of data you are reading followed by a \n
. Fixed version of your program looks like this:
with open("00-old.dat","r") as input:
with open("00-new.dat","wb") as output:
for line in input:
if "41640" not in line:
output.write(line)
To delete multiple lines you can use all()
combined with a list comprehension as for instance described in this post,
if all(nb not in line for nb in del_list):
output.write(line)
where del_list
is a list of values you want deleted,
del_list = ["41615", "41640", "42334"]
Also, due to Python's operator precedence your original condition will always evaluate to True
. That is because even if the 41640!=line
was false, the \n
is added to it and interpreted (after conversion) as True
. Basically, the !=
is evaluated first, instead of the string concatenation followed by a !=
.