Search code examples
shellawksedgrepcut

Grep Multiple words using a pattern


I ahve a requirement where i need to grep multiple strings from a log file if the pattern is matched

below is the log snapshot:access.log

12.12.137.16 - RMC1 [06/Jul/2016:07:34:17 -0700] "GET /identity/afr/partition/ie/n/default/opt/grid-11.1.1.9.0-5358.js HTTP/1.1" 200 9318 
12.12.137.16 - BMC1 [06/Jul/2016:07:34:17 -0700] "GET /identity/ HTTP/1.1" 200 6788 
12.12.137.16 - RMC1 [06/Jul/2016:07:34:17 -0700] "GET /identity/afr/partition/ie/n/default/opt/status-11.1.1.9.0-5358.js HTTP/1.1" 200 2297 
12.12.137.16 - RMC1 [06/Jul/2016:07:34:17 -0700] "GET /identity/afr/partition/ie/n/default/opt/poll-11.1.1.9.0-5358.js HTTP/1.1" 200 2098 
12.12.137.16 - RMC1 [06/Jul/2016:07:34:18 -0700] "GET /identity/afr/alta-v1/overflow_right_ena.png HTTP/1.1" 200 1082 
12.12.137.16 - RMC1 [06/Jul/2016:07:34:18 -0700] "GET /identity/ HTTP/1.1" 200 6749 
12.12.137.16 - RMC1 [06/Jul/2016:07:34:18 -0700] "GET /identity/afr/alta-v1/conv_l_ena.png HTTP/1.1" 200 1161 
12.12.137.16 - RMC1 [06/Jul/2016:07:34:24 -0700] "GET /identity/ HTTP/1.1" 200 6799 
12.12.137.16 - RMC1 [06/Jul/2016:07:34:27 -0700] "GET /identity/images/Dashboard/myAccess_s2.png HTTP/1.1" 200 6885 
12.12.137.16 - SSS1 [06/Jul/2016:07:34:24 -0700] "POST /identity/faces/home?_adf.ctrl-state=o9l9q161v_5 HTTP/1.1" 200 41776 

want to grep the username and time fields if the pattern matches /identity /HTTP/1.1 in the log file

so my output will be:

BMC1 06/Jul/2016:07:34:17
RMC1 06/Jul/2016:07:34:18 
RMC1 06/Jul/2016:07:34:24

Tried:

grep -E '/identity/ HTTP/1.1' *.log

But it is giving whole line.

Please assist


Solution

  • Using awk

    $ awk -F'[][ ]+' '/\/identity\/ HTTP\/1[.]1/{print $3,$4}' access.log 
    BMC1 06/Jul/2016:07:34:17
    RMC1 06/Jul/2016:07:34:18
    RMC1 06/Jul/2016:07:34:24
    

    How it works:

    • -F'[][ ]+'

      This sets the field separator to be any combination of [, ], or space.

    • /\/identity\/ HTTP\/1[.]1/{print $3,$4}

      This selects the lines of interest and prints just the third and fourth fields.

    Using sed

    $ sed -n '\|/identity/ HTTP/1[.]1|{s/^.* - //; s/[[]//; s/[]].*//; p;}' access.log 
    BMC1 06/Jul/2016:07:34:17 -0700
    RMC1 06/Jul/2016:07:34:18 -0700
    RMC1 06/Jul/2016:07:34:24 -0700
    

    How it works:

    • -n

      This tells sed not print anything unless we explicitly ask it to.

    • \|/identity/ HTTP/1[.]1|

      This selects the lines of interest.

    • s/^.* - //; s/[[]//; s/[]].*//

      For the selected lines, these three substitution commands remove the unwanted parts from the line.

    • p

      This tells sed to print what is left of the selected lines after our substitutions were made.

    Using grep -P

    If your grep supports the -P flag:

    $ grep -oP '(?<= - ).*(?= "GET /identity/ HTTP/1\.1)' access.log 
    BMC1 [06/Jul/2016:07:34:17 -0700]
    RMC1 [06/Jul/2016:07:34:18 -0700]
    RMC1 [06/Jul/2016:07:34:24 -0700]
    

    If it is important to get rid of [ and ], we can use:

    $ grep -oP '(?<= - ).*(?=] "GET /identity/ HTTP/1\.1)' access.log | tr -d '['
    BMC1 06/Jul/2016:07:34:17 -0700
    RMC1 06/Jul/2016:07:34:18 -0700
    RMC1 06/Jul/2016:07:34:24 -0700