I am putting together a list of URLs that include XML snippets, to later run wget on. Each URL only have a little difference from each other, and I have a list of those different values to fill in. Is there an easy way to change that little bit on each URL to the different values? I know sublime text can change the same thing on different lines to another thing; I am asking if there is a way to do that, except for each line the thing to change into is different.
I am trying to access data from a biology database (Ensembl 97), and one option is to wget an URL they provide that contains XML codes. I want to reuse these codes in the future for different species/genes attributes. For example, now I have the code to run a list of genes to get attributes based on one species (Algerian mouse); I want to change that code and use it for 90 other species.
I have the following code to get the information I want for the species "cabingdonii", identified by the last Attribute name definition at the end of the line:
wget -O cabingdonii.txt 'http://www.ensembl.org/biomart/martservice?query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName = "default" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" ><Dataset name = "hsapiens_gene_ensembl" interface = "default" ><Filter name = "ensembl_gene_id" value = "ENSG00000196565"/><Attribute name = "ensembl_gene_id" /><Attribute name = "cabingdonii_homolog_orthology_type" /></Dataset></Query>'
and I have a list of three other species: mspretus vpacos mmarmota
I want to repeat the wget code three more times, each time change the
<Attribute name = "cabingdonii_homolog_orthology_type" />
into the attribute name of another species, like:
<Attribute name = "mspretus_homolog_orthology_type" />
<Attribute name = "vpacos_homolog_orthology_type" />
<Attribute name = "mmarmota_homolog_orthology_type" />
While keeping the rest of the code the same. I have tried to run a for loop with python, but all the single and double quotes, as well as the slashes, make it really hard to change, especially in reality the code is much longer than this example.
using double quotes:
$ for F in mspretus_homolog_orthology_type vpacos_homolog_orthology_type mmarmota_homolog_orthology_type ; do echo -n "$F " && wget -q -O - "http://www.ensembl.org/biomart/martservice?query=<?xml version=\"1.0\" encoding=\"UTF-8\"?><Query virtualSchemaName = \"default\" formatter = \"TSV\" header = \"0\" uniqueRows = \"0\" count = \"\" datasetConfigVersion = \"0.6\" ><Dataset name = \"hsapiens_gene_ensembl\" interface = \"default\" ><Filter name = \"ensembl_gene_id\" value = \"ENSG00000196565\"/><Attribute name = \"ensembl_gene_id\" /><Attribute name = \"$F\" /></Dataset></Query>" ; done
mspretus_homolog_orthology_type ENSG00000196565
vpacos_homolog_orthology_type ENSG00000196565
mmarmota_homolog_orthology_type ENSG00000196565
Note: for bioinformatics use https://biostars.org or https://bioinformatics.stackexchange.com/