I have count file containing IDs and counts in multiple directories (for each accession SRRXXXXX). I want to add header "gene_id" and SRRabcdXXX in each file using bash loop.
Directory structure like this:
SRRabcd
count.txt
SRRefgh
count.txt
MY FILE(s)
gene1 194
gene2 40
WHAT I AM DOING
#!/bin/bash
for dir in /home/path/to/dir/SRR*/
do
sed -i '1s/^/gene_id\t"${dir}"\n/' "$dir"/count.txt
done
MY OUTPUT
gene_id "${dir}"
gene1 194
gene2 40
My Desired Output (for individual files)
gene_id SRRabcdef
gene1 194
gene2 40
To replace ${dir}
with its actual value you need to insure ${dir}
is wrapped in double quotes; while you do have "${dir}"
in your sed
script, this is embedded in a pair of single quotes which effectively negates the inner double quotes with the net result that you end up the literal string "${dir}"
in your output.
One easy approach would be to append 3 strings together to form your sed
script, eg:
# '1s/^/gene_id\t' + "${dir}" + '\n/'
sed '1s/^/gene_id\t'"${dir}"'\n/'
But the simpler (and recommended) approach is to insure the entire sed
script is wrapped in double quotes, eg:
sed "1s/^/gene_id\t${dir}\n/" "$dir"/count.txt
^ ^
Sample data:
$ head SRR*/count.txt
==> SRRabcd/count.txt <==
gene1 194
gene2 40
==> SRRefgh/count.txt <==
gene1 395
gene2 17
Modified script:
for dir in SRR*
do
echo "########## $dir"
sed "1s/^/gene_id\t${dir}\n/" "$dir"/count.txt
done
This generates:
########## SRRabcd
gene_id SRRabcd
gene1 194
gene2 40
########## SRRefgh
gene_id SRRefgh
gene1 395
gene2 17
Once you confirm the results are correct you can add the -i
flag.