Search code examples
windowspowershellbatch-filednslogfile-analysis

Count the most common occurrences of a unknown strings in a file


I have a large file full of lines like this...

19:54:05 10.10.8.5 [SERVER] Response sent: www.example.com. type A by 192.168.4.5
19:55:10 10.10.8.5 [SERVER] Response sent: ns1.example.com. type A by 192.168.4.5
19:55:23 10.10.8.5 [SERVER] Response sent: ns1.example.com. type A by 192.168.4.5

I don't care about any of the other data, only what's after the "response sent:" I'd like a sorted list of the most common occurrences of the domain-names. Problem is I won't know all the domain-names in advance, so I can't just do a search for the string.

Using the example above I'd like the output to be along the lines of

ns1.example.com (2)
www.example.com (1)

...where the number in ( ) is the counts of that occurrence.

How/what could I use to do this on Windows? The input file is .txt - the output file can be anything. Ideally a command-line process, but I'm really lost so I'd be happy with anything.


Solution

  • Cat is kinda out of the bag so lets try and help a little. This is a PowerShell solution. If you are having issues with how this works I encourage you to research the individual parts.

    If you text file was "D:\temp\test.txt" then you could do something like this.

    $results = Select-String -Path D:\temp\test.txt -Pattern "(?<=sent: ).+(?= type)" | Select -Expand Matches | Select -Expand Value
    $results | Group-Object | Select-Object Name,Count | Sort-Object Count -Descending
    

    Using your input you would get this for output

    Name             Count
    ----             -----
    ns1.example.com.     2
    www.example.com.     1
    

    Since there is regex I have saved a link that explains how it works.

    Please keep in mind that SO is, of course, a site that helps programmers and programming enthusiasts. We are devoting our free time where as some people get paid to do this.