Search code examples
pythonsemantics

Should you use the underscore _ as an "access modifier indicator" in non-library code in Python?


Introduction


So I've been doing a bit of research regarding the underscore character (_). I know most of its use cases and its semantics in them so I'll just drop them below as a recap and I'll conclude with the question, which is more of a conceptual one regarding 2 of the use cases.

Use cases


  1. To store the last evaluation in an interpreter (outside the interpreter it has no special semantics and it's undefined as a single character)
  2. In internationalization context, where you'd import functions such as gettext aliased as _
  3. In decimal grouping to aid visibility (specifically doing groups of 3 such as 1_000_000) - Note that this is only available from Python 3.6 onward.

    Example:

    1_000_000 == 10**6  # equals True
    x = 1_000
    print(x)  # yields 1000
    
  4. To "ignore" certain values, although I'd not call it "ignore" since those values are still evaluated and bound to _ just as though it were a regular identifier. Normally I'd find a better design than this since I find this a code smell. I have rarely used such an approach in many years so I guess whenever you think you need to use it, you can surely improve your design to not use it.

    Example:

    for _ in range(10):
        # do stuff without _
        ping_a_server()
    
    # outside loop that still exists and it still has the last evaluated value
    print(_)  # would yield 9
    
  5. Trailing an identifier (used by convention to avoid name clashes with a built-in identifier or reserved word):

    Example

    class Automobile:
    
        def __init__(self, type_='luxury', class_='executive'):
            self.car_type = type_
            self.car_class = class_
    
    noob_car = Automobile(type_='regular', class_='supermini')
    luxury = Automobile()
    
  6. As access modifiers but only as a convetion since Python doesn't have access modifiers in the true sense:

    Single leading underscore


    Acts as a weak "internal use" indicator. All identifiers starting with _ will be ignored by star imports (from M import *)

    Example:

    a.py
    
    _bogus = "I'm a bogus variable"
    __bogus = "I'm a bogus with 2 underscores"
    ___triple = 3.1415
    class_ = 'Plants'
    type_ = 'Red'
    regular_identifier = (x for x in range(10))
    
    b.py
    from a import *
    print(locals())  # will yield all but the ones starting with _
    

    Important conceptual observation

    I hate it when people call this private (excluding the fact that nothing's actually private in Python). If we were to put this in an analogy this would be equivalent to Java's protected, since in Java protected means "derived classes and/or within same package". So since at the module level any identifier with a leading underscore _ has a different semantics than a regular identifier (and I'm talking about semantics from Python's perspective not our perspective where CONSTANTS and global_variable mean different things but to Python they're the same thing) and is ignored by the import machinery when talking about start imports, this really is an indicator that you should use those identifiers within that module or within the classes they're defined in or their derived subclasses.

    Double leading underscores


    Without going into much details, this invokes a name mangling mechanism when used on identifiers inside a class, which makes it harder, but again, not impossible for folks to access an attribute from classes that subclass that base class.

Question


So I've been reading this book dedicated to beginners and at the variables section the author said something like:

Variables can begin with an underscore _ although we generally avoid doing this unless we write library code for others to use.

Which got me thinking... does it make sense to mark stuff as non-public in a private project or even in an open source project that's not used as a dependency in other projects?

For instance, I have an open source web app, that I push changes to on a regular basis. Its mostly there for educational purposes and because I want to write clean, standardized code and put in practice any new skills I acquire along the way. Now I'm thinking of the aformentioned: does it make sense to use identifiers that mark things non-public?

For the sake of argument let us say that in the future 500 people are actively contributing to that web app and it becomes very active code-wise. We can assume a large number of folks will use those "protected" and "private" identifiers directly (even when that's advised against, not all 500 of them would know the best practices) but since it's a non-library project, meaning it's not a dependency in other projects or that other people use, they can be "somewhat" reassured in a sense that those methods won't disappear in a code refactor, as the developer doing the refactor is likely going to notice all callers across the project and will refactor accordingly (or he won't notice but tests will tell him).

Evidently it makes sense in a library code, since all the people depending on your library and all the possible future people depending on your library or people that indirectly are dependant on your library (other folks embed your library into theirs and expose their library and so on) should be aware that identifiers that have a single trailing underscore or double trailing underscores are an implementation detail and can change at anytime. Therefore they should use your public API at all times.

What if no people will work on this project and I'll make it private and I'll be the only one working on it? Or a small subset of people. Does it make sense to use access modifier indicators in these kind of projects?


Solution

  • Your point of view seems to be based on the assumption that what is private or public ( or the equivalent suggestions in python ) is based on the developer who read and write code. That is wrong.

    Even if you write all alone a single application that you only will use, if it is correctly designed it will be divided into modules, and those modules will expose an interface and have a private implementation.

    The fact that you write both the module and the code that uses it doesn't mean that there are not parts which should be private to keep encapsulation. So yes, having the leading underscore to mark parts of a module as private make sense regardless of the number of developer working on the project or depending by it.

    EDIT:

    What we are really discussing about is encapsulation, which is the concept, general to software engineering for python and any other language.

    The idea is that you divide your whole application into pieces (the modules I was talking about, which might be python packages but might also be something else in your design) and decide which of the several functionality needed to perform your goal is implemented there (this is called the Single Responsibility Principle).

    It is a good approach to design by contract, which means to decide the abstraction your module is going to expose to other parts of the software and to hide everything which is not part of it as an implementation detail. Other modules should not rely on your implementation by only on your exposed functionalities so that you are free to change whenever you want to increase performance, favor new functionalities, improve maintainability or any other reason.

    Now, all this theoretical rant is language agnostic and application agnostic, which means that every time you design a piece of software you have to rely on the facilities offered by the language to design your modules and to build encapsulation.

    Python, one of the few if not the only one, as far as I know, has made the deliberate choice (bad, in my opinion), to not enforce encapsulation but to allow developers to have access to everything, as you have already discovered.

    This, however, does not mean that the aforementioned concepts do not apply, but just that they cannot be enforced at the language level and must be implemented in a more loose way, as a simple suggestion.

    Does it mean it is a bad idea to encapsulate implementation and freely use each bit of information available? Obviously not, it is still a good SOLID principle upon which build an architecture.

    Nothing of all this is really necessary, they are just good principles that with time and experience have been proven to create good quality software.

    Should you use in your small application that nobody else uses it? If you want things done as they should be, yes, you should.

    Are they necessary? Well, you can make it work without, but you might discover that it will cost you more effort later down the way.

    What if I'm not writing a library but a finished application? Well, does that mean that you shouldn't write it in a good, clean, tidy way?