Search code examples
bashtextsedfasta

Modify text file based on file's name, repeat for all files in folder


I have a folder with several files named : something_1001.txt; something_1002.txt; something_1003.txt; etc. Inside the files there is some text. Of course each file has a different text but the structure is always the same: some lines identified with the string ">TEXT", which are the ones I am interested in.

So my goal is :

  • for each file in the folder, read the file's name and extract the number between "_" and ".txt"
  • modify all the lines in this particular file that contain the string ">TEXT" in order to make it ">{NUMBER}_TEXT"
  • For example : file "something_1001.txt"; change all the lines containing ">TEXT" by ">1001_TEXT"; move on to file "something_1002.txt" change all the lines containing ">TEXT" by ">1002_TEXT"; etc.

Here is the code I wrote so far :

for i in /folder/*.txt
NAME=`echo $i | grep -oP '(?<=something_/).*(?=\.txt)'`
do  
    sed -i -e 's/>TEXT/>${NAME}_TEXT/g' /folder/something_${NAME}.txt
done

I created a small bash script to run the code but it's not working. There seems to be syntax errors and a loop error, but I can't figure out where.

Any help would be most welcome !


Solution

  • There are two problems here. One is that your loop syntax is wrong; the other is that you are using single quotes around the sed script, which prevents the shell from interpolating your variable.

    The grep can be avoided, anyway; the shell has good built-in facilities for extracting the base name of a file.

    for i in /folder/*.txt
    do  
        base=${i#/folder/something_}
        sed -i -e "s/>TEXT/>${base%.txt}_TEXT/" "$i"
    done
    

    The shell's ${var#prefix} and ${var%suffix} variable manipulation facility produces the value of $var with the prefix and suffix trimmed off, respectively.

    As an aside, avoid uppercase variable names, because those are reserved for system use, and take care to double-quote any variable whose contents may include shell metacharacters.