Do the c/c++ compilers push structs by value onto the stack, memcopying hundreds of bytes onto the stack if the programmer specifies a large struct? Does returning structs incur the same penalty?
Yes, the compiler will almost certainly do something like a memcpy to copy the struct or class of hundreds of bytes onto the stack if that's what you asked for. If that wasn't the case something like this wouldn't work:
std::string s = "A large amount of text";
std::string r = rev(s);
std::cout << s << " reversed is " << r << std::endl;
...
std::string rev(std::string s)
{
std::string::size_type len = s.length();
for(std::string::size_type i = 0; i < len / 2; i++)
{
swap(s[i], s[len-i]);
}
return s;
}
This is why it's nearly always recommended to use const
references when possible, as it passes just a pointer to the object.
Since the above example got objected to, here's another example:
class mystring
{
char s[200];
size_t len;
public:
mystring(const char *aS)
{
strcpy(s, aS);
len = strlen(s);
}
char& operator[](int index)
{
return s[index];
}
size_t length()
{
return len;
}
}
mystring str("Some long string");
mystring rev = rev_my_str(s);
mystring rev_my_str(mystring s)
{
size_t len = s.length();
for(size_t i = 0; i < len / 2; i++)
{
swap(s[i], s[len-i]);
}
return s;
}
In fact, this will make space for TWO mystring
objects on the stack, one for s
going into rev_my_str
, and one for the return value.
Edit:
Assembler generated by g++ -O1
[1] for the call to rev_my_string
as above. The interesting bit is the rep movsq
along with the setup of %ecx
, %rsi
and %rdi
(count, source and destination, respectively). $26 is the number of 8 byte units that it will copy. 26 * 8 = 208 bytes. %rsp
is the stack pointer. This is almost exactly how a memcpy
would look if it was inlined in a simple form [actual memcpy
most likely has a whole bunch of extra work to deal with unaligned start/end and using SSE instructions, etc].
movl $26, %ecx
movq %rsp, %rdi
movq %rbx, %rsi
rep movsq
leaq 416(%rsp), %rdi
call _Z10rev_my_str8mystring
And rev_my_string itself looks like this. Note the rep movsq
at the bottom of the function. That's where it stores back the resulting string.
_Z10rev_my_str8mystring:
.LFB990:
.cfi_startproc
movq %rdi, %rax
movq 208(%rsp), %r9
movq %r9, %r10
shrq %r10
je .L5
addq $1, %r10
movl $1, %edx
.L6:
movl %r9d, %ecx
subl %edx, %ecx
leaq 7(%rsp), %rsi
addq %rdx, %rsi
movzbl (%rsi), %edi
movslq %ecx, %rcx
movzbl 8(%rsp,%rcx), %r8d
movb %r8b, (%rsi)
movb %dil, 8(%rsp,%rcx)
addq $1, %rdx
cmpq %r10, %rdx
jne .L6
.L5:
movl $26, %ecx
movq %rax, %rdi
leaq 8(%rsp), %rsi
rep movsq
ret
[1] Using higher optimisation than that makes the compiler inline too much of the code (for example the rev_my_string function gets inlined), and it gets very hard to see what goes on.