I've been running simulations on a cluster and I would like to check temporary results by going through all cluster nodes and to copy all the files I need.
What I've been trying to do is to extract job ID and node name as a string from a text file that looks like this after typing qstat -rn u djsavic
:
fermi:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- ----------- -------- ---------------- ------ ----- ------ ------ -------- - --------
59281.fermi djsavic xlarge Smith2 30676 1 2 -- 96:00:00 R 24:19:14
fermi-node08/1+fermi-node08/0
59282.fermi djsavic xlarge Smith2 30686 1 2 -- 96:00:00 R 24:18:56
fermi-node08/3+fermi-node08/2
59283.fermi djsavic xlarge Smith2 30700 1 2 -- 96:00:00 R 24:18:56
fermi-node08/5+fermi-node08/4
59284.fermi djsavic xlarge Smith2 30729 1 2 -- 96:00:00 R 24:21:09
fermi-node08/7+fermi-node08/6
59285.fermi djsavic xlarge Smith2 9076 1 2 -- 96:00:00 R 24:19:24
fermi-node07/1+fermi-node07/0
59286.fermi djsavic xlarge Smith2 9078 1 2 -- 96:00:00 R 24:19:23
fermi-node07/3+fermi-node07/2
59287.fermi djsavic xlarge Smith2 9079 1 2 -- 96:00:00 R 24:19:41
fermi-node07/5+fermi-node07/4
59288.fermi djsavic xlarge Smith2 9080 1 2 -- 96:00:00 R 24:19:57
fermi-node07/7+fermi-node07/6
In reality, the list is longer, around 80 lines.
What I need are jobs ID and node name, so I could copy files e.g. from directory fermi-node08/59281/
to some /location
After a lot of digging and searching throught the internet, so far, I did something like this:
for i in `qstat -rn -u djsavic`; do
for j in `echo $i|grep fermi`; do
echo $j|sed -r 's/(.{12}).*/\1/'|sed 's/.fermi//';
done;
done
and what I get is a list like this:
fermi:
59281
fermi-node08
59282
fermi-node08
59283
fermi-node08
59284
fermi-node08
59285
fermi-node07
59286
fermi-node07
59287
fermi-node07
59288
fermi-node07
At this point, I would like to copy files from all /fermi-node##/JobID/
to a desired location and also to remove this fermi:
from the top of the list. I am new to bash scripting and I would really appreciate if anyone can help me with the final step.
Thanks in advance.
awk
to the rescue!
If your input is in that form (the records are in two lines) and three header lines, you can extract the information you need with this
$ awk 'NR>3{ if(!(NR%2)) {sub(".fermi","",$1); n=$1}
else {sub("/.*","",$1); print $1"/"n}}' file
fermi-node08/59281
fermi-node08/59282
fermi-node08/59283
fermi-node08/59284
fermi-node07/59285
fermi-node07/59286
fermi-node07/59287
fermi-node07/59288
you can use this in a while loop for your further processing such as
$ while read f; do echo $f; done < <(awk ...)
just replace echo $f
with what you want to do.
UPDATE: if the header lines are not fixed, this may be more robust
$ awk '/^[0-9]*\.fermi/ {sub(".fermi","",$1); n=$1; next}
n{sub("/.*","",$1); print $1"/"n;n=""}' file