Search code examples
bashawksedscriptingcut

Extract string from many brackets


I have a file with this content:

    ok: [10.9.22.122] => {
        "out.stdout_lines": [
            "cgit-1.1-11.el7.x86_64",
            "python-paramiko-2.1.1-0.9.el7.noarch",
            "varnish-libs-4.0.5-1.el7.x86_64",
            "kernel-3.10.0-862.el7.x86_64"
        ]
    }
    ok: [10.9.33.123] => {
        "out.stdout_lines": [
            "python-paramiko-2.1.1-0.9.el7.noarch"
        ]
    }

    ok: [10.9.44.124] => {
        "out.stdout_lines": [
            "python-paramiko-2.1.1-0.9.el7.noarch",
            "kernel-3.10.0-862.el7.x86_64"
        ]
    }

   ok: [10.9.33.29] => {
       "out.stdout_lines": []
   }
   ok: [10.9.22.28] => {
       "out.stdout_lines": [
        "NetworkManager-tui-1:1.12.0-8.el7_6.x86_64", 
        "java-1.8.0-openjdk-javadoc-zip-debug-1:1.8.0.171-8.b10.el7_5.noarch", 
        "java-1.8.0-openjdk-src-1:1.8.0.171-8.b10.el7_5.x86_64", 
        "kernel-3.10.0-862.el7.x86_64", 
        "kernel-tools-3.10.0-862.el7.x86_64", 
    ]
}

ok: [10.2.2.2] => {
    "out.stdout_lines": [
        "monitorix-3.10.1-1.el6.noarch", 
        "singularity-runtime-2.6.1-1.1.el6.x86_64"
    ]
}

ok: [10.9.22.33] => {
    "out.stdout_lines": [
        "NetworkManager-1:1.12.0-8.el7_6.x86_64",
        "gnupg2-2.0.22-5.el7_5.x86_64", 
        "kernel-3.10.0-862.el7.x86_64", 
    ]
}

I need to extract the IP between [] if into stout_line contains kernel*.

I want to "emulate" substring, to save a 'block' of content into varible and go through the all file.
How would I use sed, or other, to do this if I have many delimiter?


Solution

  • A GNU awk solution:

    awk -F'\\]|\\[' 'tolower($3)~/"out.stdout_lines" *:/ && tolower($4)~/"kernel/{print "The IP " $2 " cointain Kernel"}' RS='}' file
    

    Output:

    The IP 10.9.22.122 cointain Kernel
    The IP 10.9.44.124 cointain Kernel
    The IP 10.9.22.28 cointain Kernel
    The IP 10.9.22.33 cointain Kernel
    

    I used ] or [ as FS field separator, and } as RS record separator.
    So the IP will just becomes $2.
    This solution depends on the structure, that means "out.stdout_lines" needs to be in the field after [ip] like you showed in your example.

    Another GNU awk way, no above limitation:

    awk -F']' 'match(tolower($0),/"out\.stdout_lines": *\[([^\]]+)/,m){if(m[1]~/"kernel/)print "The IP " substr($1, index($1,"[")+1) " cointain Kernel"}' RS='}' file
    

    Same output. The tolowers are for case insensitive match, If you want exact match, you can remove them or just use solutions from Revision 6.

    Combine merits from above two ways, the Third way:

    awk -F'\\]|\\[' 'match(tolower($0),/"out\.stdout_lines": *\[([^\]]+)/,m){if(m[1]~/"kernel/)print "The IP " $2 " cointain Kernel"}' RS='}' file
    

    Change tolower($0) to $0 if you don't need case insensitive match.