Search code examples
cstring

How does strcmp() work?


I've been looking around a fair bit for an answer. I'm going to make a series of my own string functions like my_strcmp(), my_strcat(), etc.

Does strcmp() work through each index of two arrays of characters and if the ASCII value is smaller at an identical index of two strings, that string is there alphabetically greater and therefore a 0 or 1 or 2 is returned? I guess what Im asking is, does it use the ASCII values of characters to return these results?

Any help would be greatly appreciated.

[REVISED]

OK, so I have come up with this... it works for all cases except when the second string is greater than the first.

Any tips?

int my_strcmp(char s1[], char s2[])
{   
    int i = 0;
    while ( s1[i] != '\0' )
    {
        if( s2[i] == '\0' ) { return 1; }
        else if( s1[i] < s2[i] ) { return -1; }
        else if( s1[i] > s2[i] ) { return 1; }
        i++;
    }   
    return 0;
}


int main (int argc, char *argv[])
{
    int result = my_strcmp(argv[1], argv[2]);

    printf("Value: %d \n", result);

    return 0;

}

Solution

  • The pseudo-code "implementation" of strcmp would go something like:

    define strcmp (str1, str2):
        p1 = address of first character of str1
        p2 = address of first character of str2
    
        while p1 not at end of str1:
            if p2 at end of str2: 
                return 1
    
            if contents of p2 greater than contents of p1:
                return -1
    
            if contents of p1 greater than contents of p2:
                return 1
    
            advance p1
            advance p2
    
        if p2 not at end of str2:
            return -1
    
        return 0
    

    That's basically it. Each character is compared in turn and a decision is made as to whether the first or second string is greater(a), based on that character.

    Only if the characters are identical do you move to the next character and, if all the characters were identical, zero is returned.

    Note that you may not necessarily get 1 and -1, the specs say that any positive or negative value will suffice, so you should always check the return value with < 0, > 0 or == 0.

    Turning that into real C would result in something like this:

    int myStrCmp (const char *str1, const char *str2) {
        const unsigned char *p1 = (const unsigned char *) str1;
        const unsigned char *p2 = (const unsigned char *) str2;
    
        while (*p1 != '\0') {
            if (*p2 == '\0') return  1;
            if (*p2 > *p1)   return -1;
            if (*p1 > *p2)   return  1;
    
            p1++;
            p2++;
        }
    
        if (*p2 != '\0') return -1;
    
        return 0;
    }
    

    (a) Keep in mind that "greater" in the context of characters is not necessarily based on simple ASCII ordering for all string functions.

    C has a concept called 'locales' which specify (amongst other things) the collation (ordering of the underlying character set).

    You may therefore find, for example, that the characters from the set {a, á, à, ä} are all considered identical when comparing.