Search code examples
pythonuser-input

How can I check whether a given file is FASTA?


I am designing a code that requires a .fasta file to be input at one of the early stages. Right now, I am validating the input using this function:

def file_validation(fasta):
    while True:
        try:
            file_name= str(raw_input(fasta))
        except IOError:
            print("Please give the name of the fasta file that exists in the folder!")
            continue

        if not(file_name.endswith(".fasta")):
            print("Please give the name of the file with the .fasta extension!")
        else:
            break
    return file_name

Now, although this function works fine, there is still some room for error in the sense that a user could potentially maybe input a file that, while having a file name that ends with .fasta, could have some non-.fasta content inside. What could I do to prevent this and let the user know that his/her .fasta file is corrupted?


Solution

  • Why not just parse the file as if it were FASTA and see whether it breaks?

    Using biopython, which silently fails by returning an empty generator on non-FASTA files:

    from Bio import SeqIO
    
    my_file = "example.csv"  # Obviously not FASTA
    
    def is_fasta(filename):
        with open(filename, "r") as handle:
            fasta = SeqIO.parse(handle, "fasta")
            return any(fasta)  # False when `fasta` is empty, i.e. wasn't a FASTA file
    
    is_fasta(my_file)
    # False