Search code examples
arrayscstringcharfrequency

C Programming - Counting Frequencies Of Characters In A String (Problem In My Code)


I am trying to do this exercise in C programming: "Write a program that takes a string as input and counts the frequency of each character.".

I have written this method:

void printCharactersFrequenciesOf(char s[]){
    size_t stringlength = strlen(s); // length of string
    char chars[stringlength]; // variable for the different characters in the string
    int charsFrequencies[stringlength], charAlreadyExists, differentCharsNumber = 0; // variables for the frequency of each character in the string, a flag to know whether the character already exists in the characters array, and for the different characters number in the string
    // putting the different characters of the string in the different characters array
    for (int i = 0; i < stringlength; i++){
        charAlreadyExists = 0;
        for (int j = 0; j < i; j++){
            if (s[i] == chars[j]){
                charAlreadyExists = 1;
                j = i; // break loop
            }
        }
        if (charAlreadyExists == 0){
            chars[differentCharsNumber] = s[i];
            differentCharsNumber++;
        }
    }
    chars[differentCharsNumber] = charsFrequencies[differentCharsNumber] = '\0'; // terminating the different characters array and the characters frequencies array with a null terminator if they're shorter than the length of the string
    int charCount; // a counter variable for the number of appearance of each existing character
    // getting character frequencies into the character frequencies array
    for (int i = 0; i < differentCharsNumber; i++){
        charCount = 0;
        for (int j = 0; j < stringlength; j++){
            if (chars[i] == s[j]){
                charCount++;
            }
        }
        charsFrequencies[i] += charCount;
    }
    // printing the frequencies of the different characters
    for (int i = 0; i < differentCharsNumber; i++){
        printf("Frequency of '%c': %d\n", chars[i], charsFrequencies[i]);
    }
}

In this method I first put all the different characters from the source string in an array. Then I go through the different characters array, and for each character check the string and try to find the character frequency.

But unfortunately this code doesn't seem to work. It does get and prints the different characters of the string, but the frequencies are going crazy.

For example, for the string "Temme" I get:

Frequency of 'T': -1920988639

Frequency of 'e': -23

Frequency of 'm': -606004806

When I expect to get:

Frequency of 'T': 1

Frequency of 'e': 2

Frequency of 'm': 2

However, for the string "bb" I get:

Frequency of 'b': 2

As expected.

I'd like to know what I did wrong, even if this solution is not ideal.

Thanks in advance.


Solution

  • You are overcomplicating (and it is very hard to read your code because of the not very logical algorithm) a very simple function. Simple have an array long enough to accommodate the counts of all of your characters.

    In this example code, I count characters from 32 to 127. You can change it to include (for example) control characters

    #define MAX_ASCII 127
    #define MIN_ASCII 32
    
    size_t count(const char *str, size_t *arr)
    {
        size_t len = 0;
        if(str && arr)
        {
            memset(arr, 0, (MAX_ASCII - MIN_ASCII + 1) * sizeof(*arr));
            while(*str)
            {
                if(*str >= MIN_ASCII && (unsigned char)*str <= MAX_ASCII)
                {
                    arr[*str - MIN_ASCII] += 1;
                }
                str++;
                len++;
            }
        }
        return len;
    }
    
    
    int main(void)
    {
        size_t freq[MAX_ASCII - MIN_ASCII + 1];
        char *str = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";
        size_t len = count(str, freq);
        
        printf("Total length of the string is: %zu\n", len);
        for(int i = 0; i <= MAX_ASCII - MIN_ASCII; i++)
        {
            if(freq[i])
                printf("Char %03d ('%c') was found % 4zu times (% 6.2f%%)\n", i + MIN_ASCII,
                    i + MIN_ASCII, freq[i], (100.0 * freq[i]) / len);
        }
    }
    

    https://godbolt.org/z/53corfj19

    Result:

    Total length of the string is: 574
    Char 032 (' ') was found   90 times ( 15.68%)
    Char 039 (''') was found    1 times (  0.17%)
    Char 044 (',') was found    4 times (  0.70%)
    Char 046 ('.') was found    4 times (  0.70%)
    Char 048 ('0') was found    3 times (  0.52%)
    Char 049 ('1') was found    2 times (  0.35%)
    Char 053 ('5') was found    1 times (  0.17%)
    Char 054 ('6') was found    1 times (  0.17%)
    Char 057 ('9') was found    1 times (  0.17%)
    Char 065 ('A') was found    1 times (  0.17%)
    Char 073 ('I') was found    6 times (  1.05%)
    Char 076 ('L') was found    5 times (  0.87%)
    Char 077 ('M') was found    1 times (  0.17%)
    Char 080 ('P') was found    1 times (  0.17%)
    Char 097 ('a') was found   28 times (  4.88%)
    Char 098 ('b') was found    5 times (  0.87%)
    Char 099 ('c') was found   10 times (  1.74%)
    Char 100 ('d') was found   16 times (  2.79%)
    Char 101 ('e') was found   59 times ( 10.28%)
    Char 102 ('f') was found    6 times (  1.05%)
    Char 103 ('g') was found   11 times (  1.92%)
    Char 104 ('h') was found   14 times (  2.44%)
    Char 105 ('i') was found   32 times (  5.57%)
    Char 107 ('k') was found    7 times (  1.22%)
    Char 108 ('l') was found   17 times (  2.96%)
    Char 109 ('m') was found   18 times (  3.14%)
    Char 110 ('n') was found   38 times (  6.62%)
    Char 111 ('o') was found   25 times (  4.36%)
    Char 112 ('p') was found   18 times (  3.14%)
    Char 114 ('r') was found   24 times (  4.18%)
    Char 115 ('s') was found   39 times (  6.79%)
    Char 116 ('t') was found   43 times (  7.49%)
    Char 117 ('u') was found   17 times (  2.96%)
    Char 118 ('v') was found    5 times (  0.87%)
    Char 119 ('w') was found    6 times (  1.05%)
    Char 120 ('x') was found    2 times (  0.35%)
    Char 121 ('y') was found   13 times (  2.26%)