Search code examples
c++c

Are pass-by-value structs pushed onto the stack?


Do the c/c++ compilers push structs by value onto the stack, memcopying hundreds of bytes onto the stack if the programmer specifies a large struct? Does returning structs incur the same penalty?


Solution

  • Yes, the compiler will almost certainly do something like a memcpy to copy the struct or class of hundreds of bytes onto the stack if that's what you asked for. If that wasn't the case something like this wouldn't work:

    std::string s = "A large amount of text";
    
    std::string r = rev(s);
    std::cout << s << " reversed is " << r << std::endl; 
    
    ...
    std::string rev(std::string s)
    {
       std::string::size_type len = s.length();
       for(std::string::size_type i = 0; i < len / 2; i++)
       {
          swap(s[i], s[len-i]);
       }
       return s;
    }       
    

    This is why it's nearly always recommended to use const references when possible, as it passes just a pointer to the object.

    Since the above example got objected to, here's another example:

    class mystring
    {
        char s[200];
        size_t len;
      public:
        mystring(const char *aS)
        {
           strcpy(s, aS);
           len = strlen(s);
        }
        char& operator[](int index)
        {
           return s[index];
        }
        size_t length() 
        { 
           return len; 
        }
    }
    
    mystring str("Some long string");
    mystring rev = rev_my_str(s);
    
    mystring rev_my_str(mystring s)
    {
       size_t len = s.length();
       for(size_t i = 0; i < len / 2; i++)
       {
          swap(s[i], s[len-i]);
       }
       return s;
    }
    

    In fact, this will make space for TWO mystring objects on the stack, one for s going into rev_my_str, and one for the return value.

    Edit:

    Assembler generated by g++ -O1 [1] for the call to rev_my_string as above. The interesting bit is the rep movsq along with the setup of %ecx, %rsi and %rdi (count, source and destination, respectively). $26 is the number of 8 byte units that it will copy. 26 * 8 = 208 bytes. %rsp is the stack pointer. This is almost exactly how a memcpy would look if it was inlined in a simple form [actual memcpy most likely has a whole bunch of extra work to deal with unaligned start/end and using SSE instructions, etc].

    movl    $26, %ecx
    movq    %rsp, %rdi
    movq    %rbx, %rsi
    rep movsq
    leaq    416(%rsp), %rdi
    call    _Z10rev_my_str8mystring
    

    And rev_my_string itself looks like this. Note the rep movsq at the bottom of the function. That's where it stores back the resulting string.

     _Z10rev_my_str8mystring:
    .LFB990:
    .cfi_startproc
    movq    %rdi, %rax
    movq    208(%rsp), %r9
    movq    %r9, %r10
    shrq    %r10
    je  .L5
    addq    $1, %r10
    movl    $1, %edx
    .L6:
    movl    %r9d, %ecx
    subl    %edx, %ecx
    leaq    7(%rsp), %rsi
    addq    %rdx, %rsi
    movzbl  (%rsi), %edi
    movslq  %ecx, %rcx
    movzbl  8(%rsp,%rcx), %r8d
    movb    %r8b, (%rsi)
    movb    %dil, 8(%rsp,%rcx)
    addq    $1, %rdx
    cmpq    %r10, %rdx
    jne .L6
    .L5:
    movl    $26, %ecx
    movq    %rax, %rdi
    leaq    8(%rsp), %rsi
    rep movsq
    ret
    

    [1] Using higher optimisation than that makes the compiler inline too much of the code (for example the rev_my_string function gets inlined), and it gets very hard to see what goes on.