Search code examples
stringbashgnu-coreutilsacronym

How to grep a string in bash with letters out of order?


I have a task to do which is to find some strings (acronyms) that repeat in some specific text file.

Here follows a sample:

...
the
the
het
het
het
teh
teh
teh
teh
...

In the first step, I can count how many times each one of that appears with this command:

cat text_file.txt | sort | uniq -c | sort -gr

And the output is something like this:

2 the
3 het
4 teh

But I need also to "count/sum" these three outputs because they are using the same three characters but in a different order.

Can you guys please give me some help about this?


Solution

  • With GNU awk for splitting a string into chars given a null FS and sorted_in:

    $ cat tst.awk
    {
        split($0,chars,"")
        PROCINFO["sorted_in"] = "@val_str_asc"
        key = ""
        for (i in chars) {
            key = key chars[i]
        }
        cnt[key]++
    }
    END {
        PROCINFO["sorted_in"] = "@ind_str_asc"
        for (key in cnt) {
            print key, cnt[key]
        }
    }
    
    $ cat file
    the
    het
    teh
    foobar
    fobar
    oofrab
    
    $ awk -f tst.awk file
    abfoor 2
    abfor 1
    eht 3