windows powershell batch-file dns logfile-analysis

Count the most common occurrences of a unknown strings in a file

I have a large file full of lines like this...

19:54:05 10.10.8.5 [SERVER] Response sent: www.example.com. type A by 192.168.4.5
19:55:10 10.10.8.5 [SERVER] Response sent: ns1.example.com. type A by 192.168.4.5
19:55:23 10.10.8.5 [SERVER] Response sent: ns1.example.com. type A by 192.168.4.5

I don't care about any of the other data, only what's after the "response sent:" I'd like a sorted list of the most common occurrences of the domain-names. Problem is I won't know all the domain-names in advance, so I can't just do a search for the string.

Using the example above I'd like the output to be along the lines of

ns1.example.com (2)
www.example.com (1)

...where the number in ( ) is the counts of that occurrence.

How/what could I use to do this on Windows? The input file is .txt - the output file can be anything. Ideally a command-line process, but I'm really lost so I'd be happy with anything.

Solution

Cat is kinda out of the bag so lets try and help a little. This is a PowerShell solution. If you are having issues with how this works I encourage you to research the individual parts.

If you text file was "D:\temp\test.txt" then you could do something like this.

$results = Select-String -Path D:\temp\test.txt -Pattern "(?<=sent: ).+(?= type)" | Select -Expand Matches | Select -Expand Value
$results | Group-Object | Select-Object Name,Count | Sort-Object Count -Descending

Using your input you would get this for output

Name             Count
----             -----
ns1.example.com.     2
www.example.com.     1

Since there is regex I have saved a link that explains how it works.

Please keep in mind that SO is, of course, a site that helps programmers and programming enthusiasts. We are devoting our free time where as some people get paid to do this.