Search code examples
javasubclass

How do you flexibly retain useful object functionality between the usage and definition of a method?


Exposition:

Suppose I have this trivial interface:

interface Y { Y f(); }

I can implement it in 3 different ways:

  1. Use the general type everywhere.

    class SubY_generalist implements Y
    {
        public Y f()
        {
            Y y = new SubY_generalist();
            ...
            return y;
        }
    }
    
  2. Use the special type, but return the same value implicitly cast into general type.

    class SubY_mix implements Y
    {
        public Y f()
        {
            SubY_mix y = new SubY_mix();
            ...
            return y;
        }
    }
    
  3. Use the special type and return it the same.

    class SubY_specialist implements Y
    {
        public SubY_specialist f()
        {
            SubY_specialist y = new SubY_specialist();
            ...
            return y;
        }
    }
    

 

My considerations:

There is a lengthy conversation nearby here on the benefits of "programming to an interface". The most promoted answers do not seem to go deep into the distinction between argument and return types, which are in fact fundamentally distinct. Finding therefore that the discussion elsewhere does not provide me with a clean-cut answer, I have no choice but to speculate on it by myself — unless the kind reader can lend me a hand, of course.

I will assume the following basic facts about Java: (Are they correct?)

  1. An object is created at its most special.
  2. It may be implicitly cast into a more general type (generalized) at any time.
  3. When an object is generalized, it loses some of its useful properties, but gains none.

From these simple points, it follows that a special object is more useful, but also more dangerous than the general.

As an example, I may have a mutable container that I can generalize into being immutable. If I ensure the container has some useful properties before being thus frozen, I can generalize it at the right time to prevent the user from accidentally breaking the invariant. But is this the right way? There is another way to achieve similar isolation: I may always just make my methods package-private. Generalizing seems to be more flexible but easy to omiss and introduce a surface for subtle bugs.

But in some languages, like Python, it is deemed unnecessary to ever actually protect methods from outside access; they just label the internal methods with an underscore. A proficient user may access the internal methods to achieve some gain, provided that they know all the intricacies.

Another consequence is that inside the method definitions I should prefer specialized objects.

 

My questions:

  • Is this right thinking?
  • Am I missing something?
  • How does this relate to the talk of programming to an interface? Some locals here seem to think it is relevant, and I agree that it does, just not immediately, as I see it. It is more like programming against an interface here, or against a subclass in general. I am a bit at a loss about these intricacies.

Solution

  • Here is the context I'm using to talk about the topic. Let there be the following options for a function f:

    interface Y { ... }
    class SubY implements Y { ... }
    DEFINITION CHOICE:
    public SubY f() { ... }
    OR
    public Y f() { ... }
    USAGE CHOICE:
    ...
    Y y = f();
    OR
    SubY y = f(); //maybe with a cast
    ...
    

    Technically, all options can be correct, depending on whether you intend to expose the details of SubY to the end-user (next programmer) or not. If it looks like a bad idea to expose SubY to the end-user, don't. Otherwise, do. Here's the reasoning:

    You should always return the narrowest type that could be needed, conceptually (narrower types are further down the class hierarchy - treat it as "a narrower type is-a wider type"). For example, if you return a List, that means that you expect the end-user to interact with it only in terms of an interface. However, if you expect that the end-user will need the interactions defined within ArrayList, return that instead.

    In general, the idea is to keep the return type from methods as narrow as possible, without exposing gory details that the end-user shouldn't be aware of. As long as you properly hide the internal details handled by your class SubY, it is perfectly fine to return a SubY. On the other hand, consider the responsibility of the end-user:

    The responsibility of the end-user is to responsibly use the power they've been given by your narrow return type. You have already prevented them from doing nasty things by hiding internals properly. Then, when the end-user uses your class, they should program in terms of their needs, like so:

    // imported library provides:
    public SubY f() { ... }
    ... // la la la
    Y y = f();
    useYFunctionality(y);
    ... // somewhere else that you need SubY functionality
    SubY subY = f();
    useSubYFunctionality(subY);
    

    Therefore, the programmer should define their functions to return precise types, returning the narrowest possible implementation that is safe (it can be an interface). On the other hand, the end-user of that service/functionality should define their variables as the widest possible type that still works, to decrease coupling and increase clarity of intent (if getLicensePlateNumber is all you need, use Vehicle instead of VeryUnreliableCar). The key here is that choosing the return type doesn't have a definite answer. Instead, use your judgment to decide what should and should not be exposed to the end-user. If there's no exceptional reason to deny access to SubY, then keep this principle in mind: "Functions Return Accurately, Users Define Variables As Necessary", or FRAUDVAN.

    Applying this to your examples:

    1. Using the interface everywhere. This is best when you only care about the details of the interface, and the same holds true for the end-user. For example, if you only care about get and add, Y could be List, and SubY_generalist() could be ArrayList.

      class SubY_generalist implements Y
      {
          public Y f()
          {
              Y y = new SubY_generalist();
              ...
              return y;
          }
      }
      
    2. Using the specific type, but implicitly widening into the general type at return. This is best when you actually need to use methods for the specific type, but your end-user will never need to do so. Using ArrayList and List from the previous example, suppose f must invoke trimToSize, a method not defined in List. Then, it's clearly necessary to declare y as ArrayList. However, since the end-user only needs the List, you should return y and implicitly widen.

      class SubY_mix implements Y
      {
          public Y f()
          {
              SubY_mix y = new SubY_mix();
              ...
              return y;
          }
      }
      
    3. Using the specialized type, and returning the specialized type. This is the "essence" of FRAUDVAN. SubY_specialist as the return type allows the user the flexibility to deal with the SubY_specialist returned, or alternatively use f() in terms of the interface Y. This is incredibly common, when interfaces are insufficiently specific to be used effectively (as they should be, since they're abstract). For example, in the Java Stream class, any time a method returns a Stream, this is happening. This is because Stream implements BaseStream. It also happens any time you take a StringBuffer and reverse, append, replace, delete, or insert (StringBuffer implements CharSequence).

      class SubY_specialist implements Y
      {
          public SubY_specialist f()
          {
              SubY_specialist y = new SubY_specialist();
              ...
              return y;
          }
      }
      

    The reason this kind of behavior is needed is that interfaces often don't contain enough details to be utilized effectively by the end-user. Putting those details in interfaces would result in interface bloat/pollution, which is undesirable. Instead, directly having the class responsible for those details be used preserves the single responsibility principle for classes that have unique functionalities (avoiding being forced to implement redundancy in future implementers of the interface). After all, if your class implements two interfaces, you either have to create a new interface extending both those interfaces and return that (which can get ugly pretty quickly), or you can return the implementor, and have the end-user be responsible for which interface they need (not ugly).

    Why is this kind of behavior so common? It's because, once you hide away things that end-users don't need to see (using visibility modifiers, encapsulation, and information hiding), it's actually a good idea to let end-users interact with your implementations in the manner that they choose (i.e. as per the interface they declare their variable as). In fact, I'd even claim that all examples here are compliant to FRAUDVAN, where the "safest" type to expose to the user is Y instead of a specialized type. In the first example, the interface Y is sufficiently specialized for the user and the end-users. In the second example, Y is insufficiently specialized for the user, but sufficiently specialized for the end-users. Finally, in this example, Y is insufficiently specialized for the user and all end-users, although some end-users may choose to declare Y y = f() if Y will suit their needs.