I have a question on in-line operation to search multiple files in sub-folders with TWO patterns, and print only numerical values.
Example:
Current directory: $HOME/work/A/
(where to run script)
Subfolders containing data: $HOME/work/A/trial1, trial2, trial3..
Input (each data file): eg. trial1/trial1.out
[text]
..
cutoff = 100
..
[text]
..
! total energy= -23.4387 Ry
..
Need output: /A/totalenergy.txt
100 -23.4387
110 -23.2523
120 -24.0134
...
What I initially planned, is to use 'grep' to search each file and match pattern 'cutoff =' and '! ' to find the two desired lines, and print out only the cutoff number and energy number.
However, up to this point, what I am able to do is only search for 1 pattern, '! total energy' (more important), and use grep | tr | cut > file
to get only the energy out.
grep -e "\!" */*.out | tr -s ' ' | cut -f5 -d' ' >totalenergy.txt
basically, I grep for '!', search all subfolders for *.out, trim multiple spaces, and retain only the numerical field
The line that contains '! total energy' after using grep looks like this
60/C.scf_60.out:! total energy = -22.78085574 Ry
So, if I can somehow get the first number out from this line, plus what I have, I can also achieve my goal:
60 -22.78085574
I am trying to do this with one line command.
Thanks!
sed -rn -e 's/cutoff[ =]+([0-9]+)/\1/p' -e 's/.*total energy[= ]+([0-9.-]+).*/\1:/p' */*.out | tr '\n:' ' \n'
sed -rn -e <cmd1> -e <cmd2> */*.out
I've used sed
instead of grep
because I fell into the necessity of using a flag (I choose :
) to separate every register (cutoff total_energy).
-r # short form of --regexp-extended
Needed to match with the sintax I've used. Specially ([0-9.-]+)
-> I didn't need to escape the brackets, and I could filter .-
without problems.
-n # short option of --quiet or --silent
It disables printing of patterns unless we explicitly ask to do so (with the flag p
)
-e # short of --expression
Useful to combining multiple commands
cutoff[ =]+([0-9]+)/\1
.*total energy[= ]+([0-9.-]+).*/\1:
I'm just saving the value I need in \1
.
Notice that I appended a :
character after the value matched for total energy. As I said, it is to help me to separate registers with tr
.
's/../../p'
I've used p
to print the patterns due to I'had disabled the printing with -n
. It's needed to discard all the lines with no matches.
tr '\n:' ' \n'
Due to sed
output each value in a different line, I used a flag (:
) to know where to write a newline (\n
).
tr
is translating characters from SET1 ('\n:'
) to the ones in SET2 (' \n'
). The translation is taken replacing each character in SET1 with each character in same position in SET2:
# \n -> " " (space)
# : -> \n
Note: You'd maybe like to pipe once more (| tr -s ' '
) to clean the output
A more rigorous way to print the result is to sed
again so the output is exactly as you want:
sed -rn -e 's/cutoff[ =]+([0-9]+)/\1/p' -e 's/.*total energy[= ]+([0-9.-]+).*/\1:/p' */*.out | tr '\n' ' ' | sed -r "s/([^:]+):[ ]*/\1\n/g"
Notice that util the first |
the command is exactly the same as the one above.
tr '\n' ' '
It just replaces the newlines with spaces.
sed -r "s/([^:]+):[ ]*/\1\n/g"
It saves the string until :
and prints it followed by a newline