How do I count multiple overlapping strings and get the total occurences per line (awk or anything else)

I have an input file like this:

315secondbin    x12121321211332123x
315firstbin 3212212121x
315thirdbin 132221312
316firstbin 121
316secondbin    1212

What I want to do is count how many instances of a few different strings (say "121" and "212") exist in each line counting overlap. So my expected output would be:

So I slightly modified some awk from another thread to use the OR operator in hopes that it would count up everything that meets either condition:

{
count = 0
$0 = tolower($0)
while (length() > 0) {
    m = match($0, /212/ || /121/)
    if (m == 0)
         break
    count++
    $0 = substr($0, m + 1)
}
print count
}

unfortunately, my output is this:

But if I leave out the OR it counts perfectly. What am I doing wrong?

Also, I run the script on the file ymaz.txt by running:

 cat ymaz.txt | awk -v "pattern=" -f count3.awk

As an alternate approach I tried this:

{
count = 0
$0 = tolower($0)
while (length() > 0) {
    m = match($0, /212/)
y = match($0, /121/)
    if ((m == 0) && (y == 0))
         break
    count++
    $0 = substr($0, (m + 1) + (y + 1))
}
print count
}

but my output was this:

What am I doing wrong? I know I should be understanding the code and not cutting and pasting stuff together, but that's my skill level at this point.

BTW when I don't have the OR in there (ie I'm just searching for 1 string) it works perfectly.

Solution

You're making it too complicated:

{
    count=0
    while ( match($0,/121|212/) ) {
        count++
        $0=substr($0,RSTART+1)
    }
    print count
}

$ awk -f tst.awk file
6
5
0
1
2

Your fundamental problem is that you were confusing a condition with a regexp. A regexp can be compared with a string to form a condition, and when the string in question is $0 you can leave it out and just use regexp as a shorthand for $0 ~ regexp but in that context what's being tested is still a condition. The 2nd arg for match() is a regexp, not a condition. | is the or operator in a regexp while || is the or operator in a condition. /.../ are the regexp delimiters.

/foo/ is a regexp

$0 ~ /foo/ is a condition

/foo/ in a conditional context is shorthand for $0 ~ /foo/ but in any other context is just a regexp.

/foo/ || /bar in a conditional context is shorthand for $0 ~ /foo/ || $0 ~ /bar/ but as the 2nd arg to match() awk actually assumes you intended to write:

match($0,($0 ~ /foo/ || $0 ~ /bar/))

i.e. it will test the current record against foo or bar and if true then that condition evaluates to 1 and that 1 is then given to match() as it's 2nd arg.

Look:

$ echo foo | gawk 'match($0,/foo/||/bar/)'        
$ echo foo | gawk '{print /foo/||/bar/}'  
1
$ echo 1foo | gawk 'match($0,/foo/||/bar/)'       
1foo

Get the book Effective Awk Programming, 4th Edition, by Arnold Robbins.