Are pass-by-value structs pushed onto the stack?

Do the c/c++ compilers push structs by value onto the stack, memcopying hundreds of bytes onto the stack if the programmer specifies a large struct? Does returning structs incur the same penalty?

Solution

Yes, the compiler will almost certainly do something like a memcpy to copy the struct or class of hundreds of bytes onto the stack if that's what you asked for. If that wasn't the case something like this wouldn't work:

std::string s = "A large amount of text";

std::string r = rev(s);
std::cout << s << " reversed is " << r << std::endl; 

...
std::string rev(std::string s)
{
   std::string::size_type len = s.length();
   for(std::string::size_type i = 0; i < len / 2; i++)
   {
      swap(s[i], s[len-i]);
   }
   return s;
}

This is why it's nearly always recommended to use const references when possible, as it passes just a pointer to the object.

Since the above example got objected to, here's another example:

class mystring
{
    char s[200];
    size_t len;
  public:
    mystring(const char *aS)
    {
       strcpy(s, aS);
       len = strlen(s);
    }
    char& operator[](int index)
    {
       return s[index];
    }
    size_t length() 
    { 
       return len; 
    }
}

mystring str("Some long string");
mystring rev = rev_my_str(s);

mystring rev_my_str(mystring s)
{
   size_t len = s.length();
   for(size_t i = 0; i < len / 2; i++)
   {
      swap(s[i], s[len-i]);
   }
   return s;
}

In fact, this will make space for TWO mystring objects on the stack, one for s going into rev_my_str, and one for the return value.

Edit:

Assembler generated by g++ -O1 [1] for the call to rev_my_string as above. The interesting bit is the rep movsq along with the setup of %ecx, %rsi and %rdi (count, source and destination, respectively). $26 is the number of 8 byte units that it will copy. 26 * 8 = 208 bytes. %rsp is the stack pointer. This is almost exactly how a memcpy would look if it was inlined in a simple form [actual memcpy most likely has a whole bunch of extra work to deal with unaligned start/end and using SSE instructions, etc].

movl    $26, %ecx
movq    %rsp, %rdi
movq    %rbx, %rsi
rep movsq
leaq    416(%rsp), %rdi
call    _Z10rev_my_str8mystring

And rev_my_string itself looks like this. Note the rep movsq at the bottom of the function. That's where it stores back the resulting string.

 _Z10rev_my_str8mystring:
.LFB990:
.cfi_startproc
movq    %rdi, %rax
movq    208(%rsp), %r9
movq    %r9, %r10
shrq    %r10
je  .L5
addq    $1, %r10
movl    $1, %edx
.L6:
movl    %r9d, %ecx
subl    %edx, %ecx
leaq    7(%rsp), %rsi
addq    %rdx, %rsi
movzbl  (%rsi), %edi
movslq  %ecx, %rcx
movzbl  8(%rsp,%rcx), %r8d
movb    %r8b, (%rsi)
movb    %dil, 8(%rsp,%rcx)
addq    $1, %rdx
cmpq    %r10, %rdx
jne .L6
.L5:
movl    $26, %ecx
movq    %rax, %rdi
leaq    8(%rsp), %rsi
rep movsq
ret

[1] Using higher optimisation than that makes the compiler inline too much of the code (for example the rev_my_string function gets inlined), and it gets very hard to see what goes on.