How do I change part of a file name when it is a variable in python?

I currently have a python script which takes a file as a command-line argument, does what it needs to do, and then outputs that file with _all_ORF.fsa_aa appended. I'd like to actually edit the file name rather than appending, but I am getting confused with variables. I'm not sure how I can actually do it when the file is a variable.

Here's an example of the command-line argument:

gL=genomeList.txt   #Text file containing a list of genomes to loop through.             

for i in $(cat ${gL}); do
    #some other stuff ; 
    python ./find_all_ORF_from_getorf.py ${i}_getorf.fsa_aa ; 
    done

Here is some of the python script (find_all_ORF_from_getorf.py):

import re, sys

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

infile = sys.argv[1]

with open(f'{infile}_all_ORF.fsa_aa'.format(), "a") as file_object:
    for sequence in SeqIO.parse(infile, "fasta"):
       #do some stuff
       print(f'{sequence.description}_ORF_from_position_{h.start()},\n{sequence.seq[h_start:]}', 
       file=file_object)

Currently, the oupt file is called Genome_file_getorf.fsa_aa_all_ORF.fsa_aa.I'd like to remove the first fsa_aa so that the output looks like this: Genome_file_getorf_all_ORF.fsa_aa. How do I do this? I can't work out how to edit it.

I have had a look at the os.rename module, but that doesn't seem to be able to edit the variable name, just append to it.

Thanks,

Solution

Regarding your bash code, you might find useful the following snippet, I find it a little bit more readable and I tend to use it a lot when iterating over lines.

while read line; do
    #some other stuff ; 
    python ./find_all_ORF_from_getorf.py ${line}_getorf.fsa_aa ; 
done < genomeList.txt

Now regarding your question and your python code

import re, sys 

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

infile = sys.argv[1]

At this point your infile will look like 'Genome_file_getorf.fsa_aa' One option is to split this string through the '.' and get the first item

name = infile.split('.')[0]

In case you know there might be several '.' in the file name, like 'Myfile.out.old' and you only want to get rid of the last extension

name = infile.rsplit('.',1)[0]

A third option, if you know that that all your files end with '.fsa_aa' you can just slice the string using negative indices. As '.fsa_aa' has 7 characters:

name = input[:-7]

These three options are based on the string methods of string handling in python, see more at the official python docs

outfile = f'{name}_all_ORF.fsa_aa' 
# if you wrote f'{variable}' you don't need the ".format()"
# On the other hand you can do '{}'.format(variable)
# or even '{variable}'.format(variable=SomeOtherVariable)

with open(outfile, "a") as file_object:
    for sequence in SeqIO.parse(infile, "fasta"):
       #do some stuff
       file_object.write(f'{sequence.description}_ORF_from_position_{h.start()},\n{sequence.seq[h_start:]}')

Another option is to use Path from the pathlib library I do suggest that you play a bit with this library. In this case you would have to do some other minor changes to the code:

import re, sys
from pathlib import Path # <- Here

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

infile = Path(sys.argv[1]) # <- Here
outfile = infile.stem + '_all_ORF.fsa_aa' # <- Here 
# And if you want to use outfile as a path I would suggest instead
# outfile = infile.parent.joinpath(infile.stem)

with open(outfile, "a") as file_object:
    for sequence in SeqIO.parse(infile, "fasta"):
       #do some stuff
       file_object.write(f'{sequence.description}_ORF_from_position_{h.start()},\n{sequence.seq[h_start:]}')

Finally as you have seen in both cases I have replaced the print statement with the file_object.write method, it is better practice to write to a file rather than to print to it.