Search code examples
bashncbi

Linking files downloaded in a loop, to an identifier in a text list with a bash loop


I have a bash loop which is using a python program to loop over each identifier in a list (text file), to download genomes (files), I'm wondering whether there is a way I can link each file download to the id in the list, as the downloaded files have names which make it more difficult to use later on.

The loop in bash:

for i in $(more 'idpandas.txt'); do echo $i; ncbi-genome-download --format protein-fasta --species-taxid $i archaea,bacteria; done;

Is there anyway this is possible?


Solution

  • For sure there should be a way. But we need more information: which are the names of the files you are downloading?

    for i in $(<idpandas.txt)
    do
        echo $i
        ncbi-genome-download --format protein-fasta --species-taxid $i archaea,bacteria
        ln -s $DOWNLOAD_NAME $i
    done
    

    BTW, don't use "more" in the loop list of elements, it is a pager, it will give you problems in this scenario. Doing the link is as easy as the "ln" line. I bet your problem is knowing which is the filename that is being generated. But that is something we don't even know, either.

    Using a dirty way as suggested in one of my comments, you can store the needed files under a folder with your ID. I don't know how your download script works, but I think the following code should do the trick:

    for i in $(<idpandas.txt)
    do
        echo $i
        mkdir $i
        cd  $i
        ncbi-genome-download --format protein-fasta --species-taxid $i archaea,bacteria
        cd ..
    done
    

    This should give you a bunch of folders named like the IDs in the idpandas.txt file, and inside every folder, the downloaded files by your tool.