Search code examples
bashawksedgrep

How to use awk to capture multiple patterns in text file into several blocks of text and print each block to a new file


I have this bind dns stats text file, sample_data.txt:

+++ Statistics Dump +++ (1698804161)
++ Incoming Requests ++
            34199522 QUERY
                   2 STATUS
                  12 UPDATE
++ Incoming Queries ++
                   2 RESERVED0
            19539834 A
              203203 NS
              239215 CNAME
               25636 SOA
              235650 PTR
                  96 HINFO
              922800 MX
              616897 TXT
                   5 RP
                  13 AFSDB
                   8 SIG
                   7 KEY
             9112095 AAAA
                  15 LOC
                  18 EID
              339894 SRV
                  75 NAPTR
                   7 KX
                  11 CERT
                 232 A6
                  55 DNAME
                   5 APL
                2172 DS
                  14 SSHFP
                   6 IPSECKEY
                  35 RRSIG
                 183 NSEC
              135429 DNSKEY
                   3 DHCID
                   8 NSEC3
                   6 NSEC3PARAM
                 196 TLSA
                  27 TYPE53
                  21 HIP
                  28 TYPE59
                  20 TYPE60
                  28 TYPE61
                   3 TYPE62
                  73 TYPE63
                 156 TYPE64
             2815625 TYPE65
                2297 SPF
                   7 TYPE108
                  11 TYPE109
                 752 AXFR
                1115 ANY
                   4 DLV
                5530 Others
++ Outgoing Queries ++
[View: default]
[View: _bind]
++ Name Server Statistics ++
            34199536 IPv4 requests received
            33035183 requests with EDNS(0) received
                1433 requests with TSIG received
               74232 TCP requests received
            20645922 auth queries rejected
                4604 recursive queries rejected
                 730 transfer requests rejected
                  12 update requests rejected
            34199536 responses sent
               71843 truncated responses sent
            33035183 responses with EDNS(0) sent
                1433 responses with TSIG sent
            24625387 queries resulted in successful answer
            33852582 queries resulted in authoritative answer
              135913 queries resulted in non authoritative answer
              135913 queries resulted in referral answer
             3911181 queries resulted in nxrrset
                   2 queries resulted in SERVFAIL
             5316014 queries resulted in NXDOMAIN
              210273 other query failures
++ Zone Maintenance Statistics ++
                 234 IPv4 notifies sent
++ Resolver Statistics ++
[Common]
[View: default]
[View: _bind]
++ Cache DB RRsets ++
[View: default]
[View: _bind (Cache: _bind)]
++ Socket I/O Statistics ++
                  27 UDP/IPv4 sockets opened
                   3 TCP/IPv4 sockets opened
                  25 UDP/IPv4 sockets closed
               74330 TCP/IPv4 sockets closed
               74338 TCP/IPv4 connections accepted
                  42 TCP/IPv4 recv errors
++ Per Zone Query Statistics ++
[sampledomain1.com]
             1898118 auth queries rejected
                  77 recursive queries rejected
                  16 transfer requests rejected
                  12 update requests rejected
             5125667 queries resulted in successful answer
            10890351 queries resulted in authoritative answer
               79163 queries resulted in non authoritative answer
               79163 queries resulted in referral answer
             2997088 queries resulted in nxrrset
             2767596 queries resulted in NXDOMAIN
[sampledomain2.com]
            18026742 auth queries rejected
                1945 recursive queries rejected
                  10 transfer requests rejected
            18773892 queries resulted in successful answer
            20863228 queries resulted in authoritative answer
               56644 queries resulted in non authoritative answer
               56644 queries resulted in referral answer
              778332 queries resulted in nxrrset
             1311004 queries resulted in NXDOMAIN
--- Statistics Dump --- (1698804161)

What I am trying to do is use awk to capture the block of text between each [anydomainname] record delimiter, not including it, and output that block to a new file. So the new files, file1.txt and file2.txt, would contain:

file1.txt

             1898118 auth queries rejected
                  77 recursive queries rejected
                  16 transfer requests rejected
                  12 update requests rejected
             5125667 queries resulted in successful answer
            10890351 queries resulted in authoritative answer
               79163 queries resulted in non authoritative answer
               79163 queries resulted in referral answer
             2997088 queries resulted in nxrrset
             2767596 queries resulted in NXDOMAIN

file2.txt

            18026742 auth queries rejected
                1945 recursive queries rejected
                  10 transfer requests rejected
            18773892 queries resulted in successful answer
            20863228 queries resulted in authoritative answer
               56644 queries resulted in non authoritative answer
               56644 queries resulted in referral answer
              778332 queries resulted in nxrrset
             1311004 queries resulted in NXDOMAIN

respectively.

Right now, this what I have working:

 awk '/^\[[[:lower:]]/ {p=1; next};
     /^\[[[:lower:]]/ {p=0};
     {if (p==1) {print last} {last=$0}}' sample_data.txt | tail -n+2

which gets me this:

             1898118 auth queries rejected
                  77 recursive queries rejected
                  16 transfer requests rejected
                  12 update requests rejected
             5125667 queries resulted in successful answer
            10890351 queries resulted in authoritative answer
               79163 queries resulted in non authoritative answer
               79163 queries resulted in referral answer
             2997088 queries resulted in nxrrset
             2767596 queries resulted in NXDOMAIN
            18026742 auth queries rejected
                1945 recursive queries rejected
                  10 transfer requests rejected
            18773892 queries resulted in successful answer
            20863228 queries resulted in authoritative answer
               56644 queries resulted in non authoritative answer
               56644 queries resulted in referral answer
              778332 queries resulted in nxrrset
             1311004 queries resulted in NXDOMAIN

But as you can see, I have two problems.

  1. I still need to split each block to its respective domain section
  2. I need to then output that block of text to a new file.

Can I do this by expanding my current awk command, with a BEGIN, and for condition, and then a print to file for each block? I just do now know if I can do this with awk as I am thinking it out. TIA.

EDIT: Expanding my question to also include how to output a 3rd file that contains the block of text that is before ++ Per Zone Query Statistics ++ line, so this would be preceding the first [anydomain] pattern match, which would now be a second entry point for the second file block of text now.

file3.txt

+++ Statistics Dump +++ (1698804161)
++ Incoming Requests ++
            34199522 QUERY
                   2 STATUS
                  12 UPDATE
++ Incoming Queries ++
                   2 RESERVED0
            19539834 A
              203203 NS
              239215 CNAME
               25636 SOA
              235650 PTR
                  96 HINFO
              922800 MX
              616897 TXT
                   5 RP
                  13 AFSDB
                   8 SIG
                   7 KEY
             9112095 AAAA
                  15 LOC
                  18 EID
              339894 SRV
                  75 NAPTR
                   7 KX
                  11 CERT
                 232 A6
                  55 DNAME
                   5 APL
                2172 DS
                  14 SSHFP
                   6 IPSECKEY
                  35 RRSIG
                 183 NSEC
              135429 DNSKEY
                   3 DHCID
                   8 NSEC3
                   6 NSEC3PARAM
                 196 TLSA
                  27 TYPE53
                  21 HIP
                  28 TYPE59
                  20 TYPE60
                  28 TYPE61
                   3 TYPE62
                  73 TYPE63
                 156 TYPE64
             2815625 TYPE65
                2297 SPF
                   7 TYPE108
                  11 TYPE109
                 752 AXFR
                1115 ANY
                   4 DLV
                5530 Others
++ Outgoing Queries ++
[View: default]
[View: _bind]
++ Name Server Statistics ++
            34199536 IPv4 requests received
            33035183 requests with EDNS(0) received
                1433 requests with TSIG received
               74232 TCP requests received
            20645922 auth queries rejected
                4604 recursive queries rejected
                 730 transfer requests rejected
                  12 update requests rejected
            34199536 responses sent
               71843 truncated responses sent
            33035183 responses with EDNS(0) sent
                1433 responses with TSIG sent
            24625387 queries resulted in successful answer
            33852582 queries resulted in authoritative answer
              135913 queries resulted in non authoritative answer
              135913 queries resulted in referral answer
             3911181 queries resulted in nxrrset
                   2 queries resulted in SERVFAIL
             5316014 queries resulted in NXDOMAIN
              210273 other query failures
++ Zone Maintenance Statistics ++
                 234 IPv4 notifies sent
++ Resolver Statistics ++
[Common]
[View: default]
[View: _bind]
++ Cache DB RRsets ++
[View: default]
[View: _bind (Cache: _bind)]
++ Socket I/O Statistics ++
                  27 UDP/IPv4 sockets opened
                   3 TCP/IPv4 sockets opened
                  25 UDP/IPv4 sockets closed
               74330 TCP/IPv4 sockets closed
               74338 TCP/IPv4 connections accepted
                  42 TCP/IPv4 recv errors

Solution

  • This awk should work for you:

    awk -v hdr="file3.txt" '
    /^\+\+ Per Zone Query Statistics/ {
       hdr = ""
    }
    hdr {
       print > hdr
    }
    /^\[[[:lower:]]/ {         # indicates start domain [...]
       close(fn)
       fn = "file" ++f ".txt"  # construct output filename `fn`
       next
    }
    /^[^[:blank:]]/ {          # indicates end of block
       fn = ""
    }
    fn {
       print > fn              # prints each record to fn
    }' file