Search code examples
canagram

Anagram tester in C


I am trying to implement an anagram tester in C. When calling the program, the user enters two words in double quotes, like "listen" and "silent". I have almost got it to work, but I am having some trouble with a helper function I wrote to get rid of spaces in the two input words. Here is the code for this function:

void noSpaces(char word[100]) {
    /*
    This is a function to get rid of spaces in a word
    It does this by scanning for a space and shifting the
    array elements at indices > where the space is
    down by 1 as long as there is still a space
    there. 
    */
    for (int i = 0; i < 100; i++) {
        while (word[i] == ' ') {
            for (int j = i; j < 100; j++) {
                word[j] = word[j+1];
            }
        }
    }
}

Now, it works fine when I pass the input word from the main function to this helper. The problem is the second call to this function. When I call this function on the second input, if k is the number of spaces in the first input, then the function erases the first k letters of the second input. For example, typing ./anagram " banana" "banana" will give me a false negative, and if I add a print statement to see what's going on with the inputs after noSpaces is called on them, I get the following:

banana
anana

Here is the code for the full program:

#include <stdio.h>

int main(int argc, char *argv[]) {
    //this if statement checks for empty entry
    if (isEmpty(argv[1]) == 0 || isEmpty(argv[2]) == 0) {
        //puts("one of these strings is empty");
        return 1;
    }
    //call to noSpaces to eliminate spaces in each word
    noSpaces(argv[1]);
    noSpaces(argv[2]);
    //call to sortWords
    sortWords(argv[1]);
    sortWords(argv[2]);
    int result = compare(argv[1], argv[2]);
    /*
    if (result == 1) {
        puts("Not anagrams");
    } else {
        puts("Anagrams");
    }
    */
    return result;
}

int compare(char word1[100], char word2[100]) {
    /*
    This is a function that accepts two sorted 
    char arrays (see 'sortWords' below) and
    returns 1 if it finds a different character
    at entry i in either array, or 0 if at no 
    index the arrays have a different character.
    */
    int counter = 0;
    while (word1[counter] != '\0' && word2[counter] != '\0') {
        if (word1[counter] != word2[counter]) {
            //printf("not anagrams\n");
            return 1;
        }
        counter++;
    }
    // printf("anagrams\n");
    return 0;
}

void sortWords(char word[100]) {
    /*
    This is a function to sort the input char arrays
    it's a simple bubble sort on the array elements.
    'sortWords' function accepts a char array and returns void,
    sorting the entries in alphabetical order
    being careful about ignoring the 'special character'
    '\0'.
    */
    for (int j = 0; j < 100; j++) {
        int i = 0;
        while (word[i + 1] != '\0') {
            if (word[i] > word[i + 1]) {
                char dummy = word[i + 1];
                word[i + 1] = word[i];
                word[i] = dummy;
            }
            i++;
        }
    }
}

void noSpaces(char word[100]) {
    /*
    This is a function to get rid of spaces in a word
    It does this by scanning for a space and shifting the
    array elements at indices > where the space is
    down by 1 as long as there is still a space there. 
    */
    for (int i = 0; i < 100; i++) {
        while (word[i] == ' ') {
            for (int j = i; j < 100; j++) {
                word[j] = word[j + 1];
            }
        }
    }
}

int isEmpty(char word[100]) {
    // if a word consists of the empty character, it's empty
    //otherwise, it isn't
    if (word[0] == '\0') {
        return 0;
    }
    return 1;
}

I know there is a library that can deal with strings, but I really would like to avoid having to use it. I've come this far already without needing it, and I feel the problem is mostly solved but for one small thing I don't see.

I come from a java background, and I'm new to C, if that explains whatever error I made.


Solution

  • In C, strings are arrays of char with a null terminator, ie a byte with the value 0 usually represented as '\0'. You should not assume any particular length such as 100. Indeed the array size in the function prototype arguments is ignored by the compiler. You can determine the length by scanning for the null terminator, which is what strlen() does efficiently, or you can write the code in such a way as to avoid multiple scans, stopping at the null terminator. You should make sure your functions work for the empty string, which is an array with a single null byte. Here are the problems in your code:

    In function noSpaces, you iterate beyond the end of the string, modifying memory potentially belonging to the next string. The program has undefined behavior.

    You should stop at the end of the string. Also use 2 index variables to perform in linear time:

    void noSpaces(char word[]) {
        /*
        This is a function to get rid of spaces in a word
        It does this by scanning for a space and shifting the
        array elements at indices > where the space is
        down by 1 as long as there is still a space
        there. 
        */
        int i, j;
        for (i = j = 0; word[i] != '\0'; i++) {
            if (word[i] != ' ') {
                word[j++] = word[i];
            }
        }
        word[j] = '\0';
    }
    

    You can simplify compare to use a third as many tests on average:

    int compare(const char word1[], const char word2[]) {
        /*
        This is a function that accepts two sorted 
        char arrays (see 'sortWords' below) and
        returns 1 if it finds a different character
        at entry i in either array, or 0 if at no 
        index the arrays have a different character.
        */
        for (int i = 0; word1[i] == word2[i]; i++) {
            if (word1[i]) == '\0')
                //printf("anagrams\n");
                return 0;
            }
        }
        // printf("not anagrams\n");
        return 1;
    }
    

    sortWords has undefined behavior for the empty string because you read the char at index 1, beyond the end of the array. Here is a corrected version:

    void sortWords(char word[]) {
        /*
        This is a function to sort the input char arrays
        it's a simple bubble sort on the array elements.
        'sortWords' function accepts a char array and returns void,
        sorting the entries in alphabetical order
        being careful about ignoring the 'special character'
        '\0'.
        */
        for (int j = 0; word[j] != '\0'; j++) {
            for (int i = 1; i < j; i++) {
                if (word[i - 1] > word[i]) {
                    char dummy = word[i - 1];
                    word[i - 1] = word[i];
                    word[i] = dummy;
                }
            }
        }
    }
    

    You should declare functions before use, or alternately define them before use. Your code compiles because the compiler accepts old style C where the prototype for yet unseen functions was inferred from the arguments passed at the first call site. This practice is error prone and obsolete.

    Your sorting function has quadratic time complexity, which can be very slow for very long strings but words should not be too large so this is not a problem.

    It would be better to not modifying the argument strings. You can perform the test with a copy of one of the strings with the same time complexity.

    Here is a direct approach:

    #include <stdio.h>
    
    int check_anagrams(const char word1[], const char word2[]) {
        /*
           This function accepts two strings and returns 1 if they
           are anagrams of one another, ignoring spaces.
           The strings are not modified.
         */
        int i, j, len1, letters1, letters2;
    
        /* compute the length and number of letters of word1 */
        for (len1 = letters1 = 0; word1[len1] != '\0'; len1++) {
            if (word1[len1] != ' ')
                letters1++;
        }
    
        /* create a copy of word1 in automatic storage */
        char copy[len1];    /* this is an array, not a string */
        for (i = 0; i < len1; i++)
            copy[i] = word1[i];
    
        for (j = letters2 = 0; word2[j] != '\0'; j++) {
            char temp = word2[j];
            if (temp != ' ') {
                letters2++;
                for (i = 0; i < len1; i++) {
                    if (copy[i] == temp) {
                        copy[i] = '\0';
                        break;
                    }
                }
                if (i == len1) {
                    /* letter was not found */
                    return 0;
                }
            }
        }
        if (letters1 != letters2)
            return 0;
        return 1;
    }
    
    int main(int argc, char *argv[]) {
        const char *s1 = " listen";
        const char *s2 = "silent   ";
        if (argc >= 3) {
            s1 = argv[1];
            s2 = argv[2];
        }
        int result = check_anagrams(s1, s2);
        if (result == 0) {
            printf("\"%s\" and \"%s\" are not anagrams\n", s1, s2);
        } else {
            printf("\"%s\" and \"%s\" are anagrams\n", s1, s2);
        }
        return result;
    }