Search code examples
bashtext-extractionpbs

Selecting specific text from text file, BASH scripting


I've been running simulations on a cluster and I would like to check temporary results by going through all cluster nodes and to copy all the files I need.

What I've been trying to do is to extract job ID and node name as a string from a text file that looks like this after typing qstat -rn u djsavic:

fermi: 
                                                                               Req'd    Req'd      Elap
Job ID               Username    Queue    Jobname          SessID NDS   TSK    Memory   Time   S   Time
-------------------- ----------- -------- ---------------- ------ ----- ------ ------ -------- - --------
59281.fermi          djsavic     xlarge   Smith2            30676     1      2    --  96:00:00 R 24:19:14
    fermi-node08/1+fermi-node08/0
59282.fermi          djsavic     xlarge   Smith2            30686     1      2    --  96:00:00 R 24:18:56
    fermi-node08/3+fermi-node08/2
59283.fermi          djsavic     xlarge   Smith2            30700     1      2    --  96:00:00 R 24:18:56
    fermi-node08/5+fermi-node08/4
59284.fermi          djsavic     xlarge   Smith2            30729     1      2    --  96:00:00 R 24:21:09
    fermi-node08/7+fermi-node08/6
59285.fermi          djsavic     xlarge   Smith2             9076     1      2    --  96:00:00 R 24:19:24
    fermi-node07/1+fermi-node07/0
59286.fermi          djsavic     xlarge   Smith2             9078     1      2    --  96:00:00 R 24:19:23
    fermi-node07/3+fermi-node07/2
59287.fermi          djsavic     xlarge   Smith2             9079     1      2    --  96:00:00 R 24:19:41
    fermi-node07/5+fermi-node07/4
59288.fermi          djsavic     xlarge   Smith2             9080     1      2    --  96:00:00 R 24:19:57
    fermi-node07/7+fermi-node07/6

In reality, the list is longer, around 80 lines.

What I need are jobs ID and node name, so I could copy files e.g. from directory fermi-node08/59281/ to some /location

After a lot of digging and searching throught the internet, so far, I did something like this:

for i in `qstat -rn -u djsavic`; do
    for j in `echo $i|grep fermi`; do
             echo $j|sed -r 's/(.{12}).*/\1/'|sed  's/.fermi//';
    done;
done

and what I get is a list like this:

fermi:
59281
fermi-node08
59282
fermi-node08
59283
fermi-node08
59284
fermi-node08
59285
fermi-node07
59286
fermi-node07
59287
fermi-node07
59288
fermi-node07

At this point, I would like to copy files from all /fermi-node##/JobID/ to a desired location and also to remove this fermi: from the top of the list. I am new to bash scripting and I would really appreciate if anyone can help me with the final step.

Thanks in advance.


Solution

  • awk to the rescue!

    If your input is in that form (the records are in two lines) and three header lines, you can extract the information you need with this

    $ awk 'NR>3{ if(!(NR%2)) {sub(".fermi","",$1); n=$1}
                  else {sub("/.*","",$1); print $1"/"n}}' file
    
    fermi-node08/59281
    fermi-node08/59282
    fermi-node08/59283
    fermi-node08/59284
    fermi-node07/59285
    fermi-node07/59286
    fermi-node07/59287
    fermi-node07/59288
    

    you can use this in a while loop for your further processing such as

    $ while read f; do echo $f; done < <(awk ...)
    

    just replace echo $f with what you want to do.

    UPDATE: if the header lines are not fixed, this may be more robust

    $ awk '/^[0-9]*\.fermi/ {sub(".fermi","",$1); n=$1; next}
                           n{sub("/.*","",$1); print $1"/"n;n=""}' file