While I was reading through glibc source code, I found this interesting comment in strcat.c . Can anyone explain how does this optimization work?
/* Make S1 point before the next character, so we can increment
it while memory is read (wins on pipelined cpus). */
s1 -= 2;
do
{
c = *s2++;
*++s1 = c;
}
while (c != '\0');
Pipelined CPU's can do some things in parallel. For instance, it can increment the address of S1, while reading from the address it used to point at.