Search code examples
referenceargumentsdstorage-class-specifier

How should I use storage class specifiers like ref, in, out, etc. in function arguments in D?


There are comparatively many storage class specifiers for functions arguments in D, which are:

  • none
  • in (which is equivalent to const scope)
  • out
  • ref
  • scope
  • lazy
  • const
  • immutable
  • shared
  • inout

What's the rational behind them? Their names already put forth the obvious use. However, there are some open questions:

  1. Should I use ref combined with in for struct type function arguments by default?
  2. Does out imply ref implicitely?
  3. When should I use none?
  4. Does ref on classes and/or interfaces make sense? (Class types are references by default.)
  5. How about ref on array slices?
  6. Should I use const for built-in arithmetic types, whenever possible?

More generally put: When and why should I use which storage class specifier for function argument types in case of built-in types, arrays, structs, classes and interfaces? (In order to isolate the scope of the question a little bit, please don't discuss shared, since it has its own isolated meaning.)


Solution

    1. I wouldn't use either by default. ref parameters only take lvalues, and it implies that you're going to be altering the argument that's being passed in. If you want to avoid copying, then use const ref or auto ref. But const ref still requires an lvalue, so unless you want to duplicate your functions, it's frequently more annoying than it's worth. And while auto ref will avoid copying lvalues (it basically makes it so that there's a version of the function which takes an lvalues by ref and one which takes rvalues without ref), it only works with templates, limiting its usefulness. And using const can have far-reaching consequences due to the fact that D's const is transitive and the fact that it's undefined behavior to cast away const from a variable and modify it. So, while it's often useful, using it by default is likely to get you into trouble.

      Using in gives you scope in addition to const, which I'd generally advise against. scope on function parameters is supposed to make it so that no reference to that data can escape the function, but the checks for it aren't properly implemented yet, so you can actually use it in a lot more situations than are supposed to be legal. There are some cases where scope is invaluable (e.g. with delegates, since it makes it so that the compiler doesn't have to allocate a closure for it), but for other types, it can be annoying (e.g. if you pass an array be scope, then you couldn't return a slice to that array from the function). And any structs with any arrays or reference types would be affected. And while you won't get many complaints about incorrectly using scope right now, if you've been using it all over the place, you're bound to get a lot of errors once it's fixed. Also, its utterly pointless for value types, since they have no references to escape. So, using const and in on a value type (including structs which are value types) are effectively identical.

    2. out is the same as ref except that it resets the parameter to its init value so that you always get the same value passed in regardless of what the previous state of the variable being passed in was.

    3. Almost always as far as function arguments go. You use const or scope or whatnot when you have a specific need it, but I wouldn't advise using any of them by default.

    4. Of course it does. ref is separate from the concept of class references. It's a reference to the variable being passed in. If I do

      void func(ref MyClass obj)
      {
          obj = new MyClass(7);
      }
      
      auto var = new MyClass(5);
      func(var);
      

      then var will refer the newly constructed new MyClass(7) after the call to func rather than the new MyClass(5). You're passing the reference by ref. It's just like how taking the address of a reference (like var) gives you a pointer to a reference and not a pointer to a class object.

      MyClass* p = &var; //points to var, _not_ to the object that var refers to.
      
    5. Same deal as with classes. ref makes the parameter refer to the variable passed in. e.g.

      void func(ref int[] arr)
      {
          arr ~= 5;
      }
      
      auto var = [1, 2, 3];
      func(var);
      assert(var == [1, 2, 3, 5]);
      

      If func didn't take its argument by ref, then var would have been sliced, and appending to arr would not have affected var. But since the parameter was ref, anything done to arr is done to var.

    6. That's totally up to you. Making it const makes it so that you can't mutate it, which means that you're protected from accidentally mutating it if you don't intend to ever mutate it. It might also enable some optimizations, but if you never write to the variable, and it's a built-in arithmetic type, then the compiler knows that it's never altered and the optimizer should be able to do those optimizations anyway (though whether it does or not depends on the compiler's implementation).

      immutable and const are effectively identical for the built-in arithmetic types in almost all cases, so personally, I'd just use immutable if I want to guarantee that such a variable doesn't change. In general, using immutable instead of const if you can gives you better optimizations and better guarantees, since it allows the variable to be implicitly shared across threads (if applicable) and it always guarantees that the variable can't be mutated (whereas for reference types, const just means only that that reference can't mutate the object, not that it can't be mutated).

      Certainly, if you mark your variables const and immutable as much as possible, then it does help the compiler with optimizations at least some of the time, and it makes it easier to catch bugs where you mutated something when you didn't mean to. It also can make your code easier to understand, since you know that the variable is not going to be mutated. So, using them liberally can be valuable. But again, using const or immutable can be overly restrictive depending on the type (though that isn't a problem with the built-in integral types), so just automatically marking everything as const or immutable can cause problems.