Search code examples
bashawkgrepcut

Finding a string after grep


I have this file:

a=1 b=2 1234j12342134h d="a v" id="y_123456" something else 
a=1 b=2 1234j123421341 d="a" something else 
a=1 b=2 1234j123421342 d="a D v id=" id="y_123458" something else 
a=1 b=2 1234j123421344 d="a  v" something else 
a=1 b=2 1234j123421346 d="a.a." id="y_123410" something else 

and I want to retrieve only the lines that contain 'id=', and only the value for id and the 3rd column. The final product should be

1234j12342134h id="y_123456" 
1234j123421342 id="y_123458"
1234j123421346 id="y_123410"

or

1234j12342134h "y_123456" 
1234j123421342 "y_123458"
1234j123421346 "y_123410"

or even

1234j12342134h y_123456 
1234j123421342 y_123458
1234j123421346 y_123410

I tried a grep -o for the begin and end of the expression, but that misses the first block of ids. I tried awk, but that fails for columns with spaces.

I got it working with Java, but it is slow as the log files get bigger.

How can I do it using bash utilities?


Solution

  • With GNU awk (for 3rd arg for match()):

    $ gawk 'match($0,/id="[^" ]+"/,a){ print $3, a[0] }' file
    1234j12342134h id="y_123456"
    1234j123421342 id="y_123458"
    1234j123421346 id="y_123410"
    

    WIth other awks:

    $ awk 'match($0,/id="[^" ]+"/){ print $3, substr($0,RSTART,RLENGTH) }' file
    1234j12342134h id="y_123456"
    1234j123421342 id="y_123458"
    1234j123421346 id="y_123410"
    

    or if you want to strip some of the leading/trailing chars a couple of ways would be:

    $ gawk 'match($0,/id="([^" ]+)"/,a){ print $3, a[1] }' file
    1234j12342134h y_123456
    1234j123421342 y_123458
    1234j123421346 y_123410
    

    or:

    $ awk 'match($0,/id="[^" ]+"/){ print $3, substr($0,RSTART+4,RLENGTH-5) }' file
    1234j12342134h y_123456
    1234j123421342 y_123458
    1234j123421346 y_123410