This question is directly linked to my "How to modify a tsv-file column with Python" question. Briefly: I'd like to overwrite the first column of a TSV file, by changing a certain symbol (in_char
) with another one (out_char
).
In order to write over the original file, I thought to use the .truncate()
method by writing this:
with open(my_file, "r+") as mf:
lines = [line.rstrip() for line in mf]
for line in lines:
line = line.replace(in_char, out_char, 1)
mf.seek(0)
mf.write(line)
mf.truncate()
mf.close()
Actually the file is correctly overwritten but only with the last row of the TSV, so I basically obtain a TSV with one row.
For example if my in_char
is the "|" symbol and my out_char
is the "_" symbol, from my original TSV:
A|circ properties m4 298 298 28 + . coverage=81;
B|circ properties m4 307 307 40 - . coverage=74;
C|circ properties m4 361 361 23 + . coverage=77;
This is what I obtain:
C_circ properties m4 361 361 23 + . coverage=77;
Where am I doing it wrong?
The problem is that you are modifying the file as you read it. I suggest you take one of two approaches:
Read the entire file into memory, make the modifications, then write the file back out.
Create a temporary file to write to. Read the input file one line at a time, make the changes and write each line to the temporary file. Then rename the temporary file back to the original.
As an aside, I suggest using the standard csv
module for this. In particular, DictReader
and DictWriter
make this task straightforward.