Count number of different occurrences in a string by UNIX along one column into a file

I would like to count number of times appear the different susbtrings into a set of strings in 2nd column inside a tab file. So, in this way I'm doing an split to separate every substring and then try to count them. However does not work correctly.

The input is like

rs12255619 A/C chr10    AA AA AC AA AA AA AA AA AA AC AA
rs7909677 A/G chr10     AA AA AA AA AA AA AA AA AA AA AA

The desired output

rs12255619 A/C chr10    AA AA AC AA AA AA AA AA AA AC AA   AA=9;AC=2
rs7909677 A/G chr10     AA AA AA AA AA AA AA AA AA AA CC   AA=10;CC=1

and so on....

awk 'BEGIN {FS=OFS="\t"} {gf=split($2,gfp," ")} {for (i=1;i<=gf;i++){
                                      if (gfp[i]=="AA"){i++; printf $1FS$2FS"%s\n" i, gfp[i]}
                                      else if (gfp[i]=="AC" || gfp[i] == "CA"){i++; printf $1FS$2FS"%s"gfp[i]"="i";\n"}
                                                            }}' input > output

and also I'm try to do other script but I think count repeating each count the same number of times that take place for every row. Here I have performed an split under the first split to discern between substrings

awk 'BEGIN {FS=OFS="\t"} {gf=split($2,gfp," ");} {for (i=1;i<=gf;i++){

                     par=gfp[i];
                     gfeach=split($2,gfpeach,par);
                     print par "=" gfeach[i]";"
                                              }
                      }' input > output

I'm for sure there are some more easy ways to do it but I cannot get solve completely. Is it possible to do in UNIX environment? Thanks in advance

Solution

Your input doesn't match your output so we're all just guessing but this might be what you want:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
{
    delete cnt
    split($2,tmp,/ /)
    for (i in tmp) {
        str = tmp[i]
        cnt[str]++
    }

    printf "%s", $0
    sep = OFS
    for (str in cnt) {
        printf "%s%s=%d", sep, str, cnt[str]
        sep = ";"
    }
    print ""
}

Depending on what your input really is the above will output the following:

$ cat file
rs12255619 A/C chr10    AA AA AC AA AA AA AA AA AA AC AA
rs7909677 A/G chr10     AA AA AA AA AA AA AA AA AA AA AA

$ awk -f tst.awk file
rs12255619 A/C chr10    AA AA AC AA AA AA AA AA AA AC AA        AA=9;AC=2
rs7909677 A/G chr10     AA AA AA AA AA AA AA AA AA AA AA        AA=11

$ cat file
rs12255619 A/C chr10    AA AA AC AA AA AA AA AA AA AC AA
rs7909677 A/G chr10     AA AA AA AA AA AA AA AA AA AA CC

$ awk -f tst.awk file
rs12255619 A/C chr10    AA AA AC AA AA AA AA AA AA AC AA        AA=9;AC=2
rs7909677 A/G chr10     AA AA AA AA AA AA AA AA AA AA CC        AA=10;CC=1