I'm writing a function in C that will take a string and remove all characters that are not a lowercase alphabetic character. I have this code written so far:
void strclean(char* str) {
while (*str) {
if (!(*str >= 'a' && *str <= 'z')) {
strcpy(str, str + 1);
str--;
}
str++;
}
}
When I pass it the string "hello[][]world" the function seems to mostly work, except the output is:
hellowoldd
When I make it print after every line that it enters the if statement, here is the output I receive:
hello][]woldd
hello[]woldd
hello]woldd
hellowoldd
It seems to be really close but I can't understand why it would be producing this output! The weirdest part is I have given the code to two other friends and it works fine on their computers. We are all running the same version of Linux (ubuntu 14.04.3), and are all using gcc to compile.
I'm not sure if there is an issue with the code that would cause inconsistency of the output, or if it's a compiler issue that's creating the problem. Maybe it has to do with strcpy on my machine compared to theirs?
The strcpy
function is not guaranteed to work if the ranges overlap, as they do in your case. From C11 7.24.2.3 The strcpy function /2
(my emphasis):
The
strcpy
function copies the string pointed to bys2
(including the terminating null character) into the array pointed to bys1
. If copying takes place between objects that overlap, the behavior is undefined.
You can use something like memmove
, which does work with overlapping ranges, as per C11 7.24.2.2 The memmove function /2
:
The
memmove
function copiesn
characters from the object pointed to bys2
into the object pointed to bys1
. Copying takes place as if then
characters from the object pointed to bys2
are first copied into a temporary array ofn
characters that does not overlap the objects pointed to bys1
ands2
, and then then
characters from the temporary array are copied into the object pointed to bys1
.
But there's a better solution that's O(n)
rather than O(n2)
in time complexity, while still being overlap-safe:
void strclean (char* src) {
// Run two pointers in parallel.
char *dst = src;
// Process every source character.
while (*src) {
// Only copy (and update destination pointer) if suitable.
// Update source pointer always.
if (islower(*src)) *dst++ = *src;
src++;
}
// Finalise destination string.
*dst = '\0';
}
You'll notice I'm also using islower()
(from ctype.h
) to detect lower case alphabetic characters. This is more portable since the C standard does not mandate that the alpha characters have consecutive code points (the digits are the only ones guaranteed to be consecutive).
There's also no separate need to check for isalpha()
since, as per C11 7.4.1.2 The isalpha function /2
, islower() == true
implies isalpha() == true
:
The
isalpha
function tests for any character for whichisupper
orislower
is true, or ...