Though the common sense and literature is clear about the behaviour of strcmp()
:
int strcmp( const char *lhs, const char *rhs );
Negative value if
lhs
appears beforerhs
in lexicographical order.Zero if
lhs
andrhs
compare equal.Positive value if
lhs
appears afterrhs
in lexicographical order.
I can't seem to make it return any values other than -1
, 0
and 1
.
Sure it is true that the behaviour is consistent with the definition but I was expecting values bigger or smaller than 1
or -1
since the definition asserts that results will be <0
, 0
or >0
, not -1
, 0
or 1
.
I tested this in several compilers and libraries with the same results. I would like to see an example where that's not the case.
#include <stdio.h>
#include <string.h>
int main()
{
printf("%d ", strcmp("a", "a"));
printf("%d ", strcmp("abc", "aaioioa"));
printf("%d ", strcmp("eer", "tsdf"));
printf("%d ", strcmp("cdac", "cdac"));
printf("%d ", strcmp("zsdvfgh", "ertgthhgj"));
printf("%d ", strcmp("abcdfg", "rthyuk"));
printf("%d ", strcmp("ze34", "ze34"));
printf("%d ", strcmp("er45\n", "io\nioa"));
printf("%d", strcmp("jhgjgh", "cdgffd"));
}
Result: 0 1 -1 0 1 -1 0 -1 1
The C standard clearly says (C11 §7.24.4.2 The strcmp
function):
The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.
It doesn't say how much greater than or less than zero the result must be; a function that always returns -1
, 0
or +1
meets the standard; so does a function that sometimes returns values with a magnitude larger than 1
, such as -27
, 0
, +35
. If your code is to conform to the C standard, it must not assume either set of results; it may only assume that the sign of the result is correct.
Here is an implementation of strcmp()
— named str_cmp()
here so that the result can be compared with strcmp()
— which does not return -1
or +1
:
#include <string.h>
#include <stdio.h>
static int str_cmp(const char *s1, const char *s2)
{
while (*s1 == *s2 && *s1 != '\0')
s1++, s2++;
int c1 = (int)(unsigned char)*s1;
int c2 = (int)(unsigned char)*s2;
return (c1 - c2);
}
int main(void)
{
printf("%d ", strcmp("a", "a"));
printf("%d ", strcmp("abc", "aAioioa"));
printf("%d\n", strcmp("eer", "tsdf"));
printf("%d ", str_cmp("a", "a"));
printf("%d ", str_cmp("abc", "aAioioa"));
printf("%d\n", str_cmp("eer", "tsdf"));
return 0;
}
When run on a Mac (macOS Mojave 10.14.6; GCC 9.2.0; Xcode 11.13.1), I get the output:
0 1 -1
0 33 -15
I did change your data slightly — "aaioioa"
became "aAioioa"
. The overall result is no different (but the value 33 is bigger than you'd get with the original string) — the return value is less than, equal to, or greater than zero as required.
The str_cmp()
function is a legitimate implementation and is loosely based on a historically common implementation of strcmp()
. It has slightly more care in the return value, but you can find two minor variants of it on p106 of Brian W Kernighan and Dennis M Ritchie
The C Programming Language, 2nd Edn (1988) — one using array indexing, the other using pointers:
int strcmp(char *s, char *t)
{
int i;
for (i = 0; s[i] == t[i]; i++)
if (s[i] == '\0')
return 0;
return s[i] - t[i];
}
int strcmp(char *s, char *t)
{
for ( ; *s == *t; s++, t++)
if (*s == '\0')
return 0;
return *s - *t;
}
The K&R code might not return the expected result if the plain char
type is signed and if one of the strings contains 'accented characters', characters from the range -128 .. -1 (or 0x80 .. 0xFF when viewed as unsigned values). The casting in my str_cmp()
code treats the data as unsigned char
(via the cast); the (int)
cast isn't really necessary because of the assignments. The subtraction of two unsigned char
values converted to int
produces a result in the range -255
.. +255
. However, modern versions of the C library don't use the direct subtraction like that if they return only -1
, 0
or +1
.
Note that the C11 standard §7.24.4 String comparison functions says:
The sign of a nonzero value returned by the comparison functions
memcmp
,strcmp
, andstrncmp
is determined by the sign of the difference between the values of the first pair of characters (both interpreted asunsigned char
) that differ in the objects being compared.
You can look at How do I check if a value matches a string?. The outline there shows:
if (strcmp(first, second) == 0) // first equal to second if (strcmp(first, second) <= 0) // first less than or equal to second if (strcmp(first, second) < 0) // first less than second if (strcmp(first, second) >= 0) // first greater than or equal to second if (strcmp(first, second) > 0) // first greater than second if (strcmp(first, second) != 0) // first unequal to second
Note how comparing to zero uses the same comparison operator as the test you're making.
You could (but probably shouldn't) write:
if (strcmp(first, second) <= -1) // first less than second
if (strcmp(first, second) >= +1) // first greater than second
You'd still get the same results, but it is not sensible to do so; always comparing with zero is easier and more uniform.
You can get a -1, 0, +1 result using:
unsigned char c1 = *s1;
unsigned char c2 = *s2;
return (c1 > c2) - (c1 < c2);
For unrestricted integers (rather than integers restricted to 0 .. 255), this is safe because it avoids integer overflows whereas subtraction gives the wrong result. For the restricted integers involved with 8-bit characters, overflow on subtraction is not an issue.