Search code examples
pythonbioinformaticsargparseends-with

Identify file extensions using endswith: not supported for file objects?


The program I am running needs to identify if an imported file is a gzipped. The files are coming in using argparse:

parser.add_argument('file_sources', nargs='*', type=argparse.FileType('r'), default=sys.stdin, help='Accepts one or more fasta files, no arguments defaults to stdin')
options = parser.parse_args()
open_files = options.file_sources

and stored in a list where the program loops through the list and determines whether that file is gzipped based on the file ext .gz:

open_files = options.file_sources
#if there is nothing in the list, then read from stdin
if not open_files:
    open_files = [sys.stdin]
#if there are files present in argparse then read their extentions
else:
    opened_files = []
    for _file in open_files:
        if _file.endswith(".gz"):
            _fh = gzip.open(_file,'r')
            opened_files.append(_fh)
        else:
            _fh = open(_file,'r')
            opened_files.append(_fh)

The code breaks at _file.endswith(".gz"):, giving the error 'file' has no attribute 'endswith. If I delete the argparse type, _file goes from being a file object to a string. Doing this causes endwith() to work, but now the file is just a string bearing its name.

How can I keep the functionality of the file while also interpreting its file extension (and not having to use absolute paths as in os.path.splitext since I'm just taking files from the program's current directory)?


Solution

  • file has no attribute endswith, but string has, so:

    change:

    if _file.endswith(".gz"):
    

    to:

    if _file.name.endswith(".gz"):
    

    and as _file is a file type, use open like this:

    _fh = gzip.open(_file.name,'r')