Search code examples
c++programming-languagessyntaxsemantics

Default pass-by-reference semantics in C++


EDIT: This question is more about language engineering than C++ itself. I used C++ as an example to show what I wanted, mostly because I use it daily. I didn't want to know how it works on C++ but open a discussion on how it could be done.

That's not the way it works right now, that's the way I wish it could be done, and that would break C compability for sure, but that's what I think extern "C" is all about.

I mean, in every function or method that you declare right now you have to explicit write that the object will be sent by reference prefixing the reference operator on it. I wish that every non-POD type would be automatically sent by reference, because I use that a lot, actually for every object that is more than 32 bits in size, and that's almost every class of mine.

Let's exemplify how it's right now, assume a, b and c to be classes:

class example {
    public:
        int just_use_a(const a &object);
        int use_and_mess_with_b(b &object);
        void do_nothing_on_c(c object);
};

Now what I wish:

class example {
    public:
        int just_use_a(const a object);
        int use_and_mess_with_b(b object);
        extern "C" void do_nothing_on_c(c object);
};

Now, do_nothing_on_c() could behave just like it is today.

That would be interesting at least for me, feels much more clear, and also if you know every non-POD parameter is coming by reference I believe the mistakes would be the same that if you had to explicit declare it.

Another point of view for this change, from someone coming from C, the reference operator seems to me a way to get the variable address, that's the way I used for getting pointers. I mean, it is the same operator but with different semantic on different contexts, doesn't that feel a little bit wrong for you too?


Solution

  • I guess you're missing the point of C++, and C++ semantics. You missed the fact C++ is correct in passing (almost) everything by value, because it's the way it's done in C. Always. But not only in C, as I'll show you below...

    Parameters Semantics on C

    In C, everything is passed by value. "primitives" and "PODs" are passed by copying their value. Modify them in your function, and the original won't be modified. Still, the cost of copying some PODs could be non-trivial.

    When you use the pointer notation (the * ), you're not passing by reference. You're passing a copy of the address. Which is more or less the same, with but one subtle difference:

    typedef struct { int value ; } P ;
    
    /* p is a pointer to P */
    void doSomethingElse(P * p)
    {
       p->value = 32 ;
       p = malloc(sizeof(P)) ; /* Don't bother with the leak */
       p->value = 45 ;
    }
    
    void doSomething()
    {
       P * p = malloc(sizeof(P)) ;
       p->value = 25 ;
    
       doSomethingElse(p) ;
    
         int i = p->value ;
       /* Value of p ? 25 ? 32 ? 42 ? */
    }
    

    The final value of p->value is 32. Because p was passed by copying the value of the address. So the original p was not modified (and the new one was leaked).

    Parameters Semantics on Java and C Sharp

    It can be surprising for some, but in Java, everything is copied by value, too. The C example above would give exactly the same results in Java. This is almost what you want, but you would not be able to pass primitive "by reference/pointer" as easily as in C.

    In C#, they added the "ref" keyword. It works more or less like the reference in C++. The point is, on C#, you have to mention it both on the function declaration, and on each and every call. I guess this is not what you want, again.

    Parameters Semantics on C++

    In C++, almost everything is passed by copying the value. When you're using nothing but the type of the symbol, you're copying the symbol (like it is done in C). This is why, when you're using the *, you're passing a copy of the address of the symbol.

    But when you're using the &, then assume you are passing the real object (be it struct, int, pointer, whatever): The reference.

    It is easy to mistake it as syntaxic sugar (i.e., behind the scenes, it works like a pointer, and the generated code is the same used for a pointer). But...

    The truth is that the reference is more than syntaxic sugar.

    • Unlike pointers, it authorizes manipulating the object as if on stack.
    • Unline pointers, when associatied with the const keyword, it authorizes implicit promotion from one type to another (through constructors, mainly).
    • Unlike pointers, the symbol is not supposed to be NULL/invalid.
    • Unlike the "by-copy", you are not spending useless time copying the object
    • Unlike the "by-copy", you can use it as an [out] parameter
    • Unlike the "by-copy", you can use the full range of OOP in C++ (i.e. you pass a full object to a function waiting an interface).

    So, references has the best of both worlds.

    Let's see the C example, but with a C++ variation on the doSomethingElse function:

    struct P { int value ; } ;
    
    // p is a reference to a pointer to P
    void doSomethingElse(P * & p)
    {
       p->value = 32 ;
       p = (P *) malloc(sizeof(P)) ; // Don't bother with the leak
       p->value = 45 ;
    }
    
    void doSomething()
    {
       P * p = (P *) malloc(sizeof(P)) ;
       p->value = 25 ;
    
       doSomethingElse(p) ;
    
         int i = p->value ;
       // Value of p ? 25 ? 32 ? 42 ?
    }
    

    The result is 42, and the old p was leaked, replaced by the new p. Because, unlike C code, we're not passing a copy of the pointer, but the reference to the pointer, that is, the pointer itself.

    When working with C++, the above example must be cristal clear. If it is not, then you're missing something.

    Conclusion

    C++ is pass-by-copy/value because it is the way everything works, be it in C, in C# or in Java (even in JavaScript... :-p ...). And like C#, C++ has a reference operator/keyword, as a bonus.

    Now, as far as I understand it, you are perhaps doing what I call half-jockingly C+, that is, C with some limited C++ features.

    Perhaps your solution is using typedefs (it will enrage your C++ colleagues, though, to see the code polluted by useless typedefs...), but doing this will only obfuscate the fact you're really missing C++ there.

    As said in another post, you should change your mindset from C development (of whatever) to C++ development, or you should perhaps move to another language. But do not keep programing the C way with C++ features, because by consciously ignoring/obfuscating the power of the idioms you use, you'll produce suboptimal code.

    Note: And do not pass by copy anything else than primitives. You'll castrate your function from its OO capacity, and in C++, this is not what you want.

    Edit

    The question was somewhat modified (see https://stackoverflow.com/revisions/146271/list ). I let my original answer, and answer the new questions below.

    What you think about default pass-by-reference semantics on C++? Like you said, it would break compatibility, and you'll have different pass-by for primitives (i.e. built-in types, which would still be passed by copy) and structs/objects (which would be passed as references). You would have to add another operator to mean "pass-by-value" (the extern "C" is quite awful and already used for something else quite different). No, I really like the way it is done today in C++.

    [...] the reference operator seems to me a way to get the variable address, that's the way I used for getting pointers. I mean, it is the same operator but with different semantic on different contexts, doesn't that feel a little bit wrong for you too? Yes and no. Operator >> changed its semantic when used with C++ streams, too. Then, you can use operator += to replace strcat. I guess the operator & got used because its signification as "opposite of pointer", and because they did not want to use yet another symbol (ASCII is limited, and the scope operator :: as well as pointer -> shows that few other symbols are usable). But now, if & bothers you, && will really unnerve you, as they added an unary && in C++0x (a kind of super-reference...). I've yet to digest it myself...