Search code examples
linuxbashgrepuniq

map query string parameter occurrences


I have a log file with many query strings, for example:

param1=val1&param2=asd&p3=fgh&p4=jkl&width=100

I want to count each parameter unique values

I've tried to replaced '&' with new line, sort & get distinct values with the following command

tr '&' '\n' | sort | uniq -c | sort -nr

but it counts all of the parameters and I need them to be sorted by key, for example, current output :

2 width=1440
13 width=480
3 width=540
9 param1=3
8 param2=4
7 param1=2

requested output :

13 width=480
3 width=540
2 width=1440
..
9 param1=3
7 param1=2
..
8 param2=4

Update, better example:

$ cat test1.txt 
param1=val1&param2=asd&p3=fgh&p4=jkl&width=100
param1=val1&param2=asd&p3=fgh&p4=jkl&width=100
param1=val1&param2=asd&p3=fgh&p4=jkl&width=300
param1=val2&param2=asdf&p3=fgh3&p4=j3kl&width=200

$ cat test1.txt | tr '&' '\n'
param1=val1
param2=asd
p3=fgh
p4=jkl
width=100
param1=val1
param2=asd
p3=fgh
p4=jkl
width=100
param1=val1
param2=asd
p3=fgh
p4=jkl
width=300
param1=val2
param2=asdf
p3=fgh3
p4=j3kl
width=200
$ cat test1.txt | tr '&' '\n' | sort | uniq -c | sort -nr
      3 param2=asd
      3 param1=val1
      3 p4=jkl
      3 p3=fgh
      2 width=100
      1 width=300
      1 width=200
      1 param2=asdf
      1 param1=val2
      1 p4=j3kl
      1 p3=fgh3

expected output: group by parameter key

      3 param1=val1
      1 param1=val2
..
      2 width=100
      1 width=300
      1 width=200
...
      3 param2=asd
      1 param2=asdf
...
      3 p4=jkl
      1 p4=j3kl
...
      3 p3=fgh
      1 p3=fgh3

Couldn't decide if unix forum is better for this question


Solution

  • You can use this awk + sort + sed command for doing this:

    awk -F '&' -v OFS='=' '{for (i=1; i<=NF; i++) freq[$i]++}
        END{for (i in freq) print freq[i], i}' file |
    sort -t= -k2,2r -k1,1nr |
    sed 's/=/ /'
    
    2 width=100
    1 width=200
    1 width=300
    3 param2=asd
    1 param2=asdf
    3 param1=val1
    1 param1=val2
    3 p4=jkl
    1 p4=j3kl
    3 p3=fgh
    1 p3=fgh3
    
    • awk command sets field separator as & and output field separator as = and counts frequency of each name=value parameter and prints = delimited output.
    • sort command uses = as delimiter to reverse sort on field2 and field1 (numeric)
    • sed command replaces first = by space to get your formatted output.