Search code examples
pythonjsonabstract-classconstantsreadonly-attribute

Pythonic approach to internal API for strings


Question

Is there a "pythonic" (i.e. canonical, official, PEP8-approved, etc) way to re-use string literals in python internal (and external) APIs?


Background

For example, I'm working with some (inconsistent) JSON-handling code (thousands of lines) where there are various JSON "structs" we assemble, parse, etc. One of the recurring problems that comes up during code reviews is different JSON structs that use the same internal parameter names, causing confusion and eventually causing bugs to arise, e.g.:

pathPacket['src'] = "/tmp"
pathPacket['dst'] = "/home/user/out"
urlPacket['src'] = "localhost"
urlPacket['dst'] = "contoso"

These two (example) packets that have dozens of identically named fields, but they represent very different types of data. There was no code-reuse justification for this implementation. People typically use code-completion engines to get the members of the JSON struct, and this eventually leads to hard-to-debug problems down the road due to mis-typed string literals causing functional issues, and not triggering an error earlier on. When we have to change these APIs, it takes a lot of time to hunt down the string literals to find out which JSON structs use which fields.


Question - Redux

Is there a better approach to this that is common amongst members of the python community? If I was doing this in C++, the earlier example would be something like:

const char *JSON_PATH_SRC = "src";
const char *JSON_PATH_DST = "dst";
const char *JSON_URL_SRC = "src";
const char *JSON_URL_DST = "dst";
// Define/allocate JSON structs
pathPacket[JSON_PATH_SRC] = "/tmp";
pathPacket[JSON_PATH_DST] = "/home/user/out";
urlPacket[JSON_URL_SRC] = "localhost";
urlPacket[JSON_URL_SRC] = "contoso";

My initial approach would be to:

  • Use abc to make an abstract base class that can't be initialized as an object, and populate it with read-only constants.
  • Use that class as a common module throughout my project.
  • By using these constants, I can reduce the chance of a monkey-patching error as the symbols won't exist if mis-spelled, whereas a string literal typo can slip through code reviews.

My Proposed Solution (open to advice/criticism)

from abc import ABCMeta

class Custom_Structure:
    __metaclass__ = ABCMeta

    @property
    def JSON_PATH_SRC():
        return self._JSON_PATH_SRC

    @property
    def JSON_PATH_DST():
        return self._JSON_PATH_DST

    @property
    def JSON_URL_SRC():
        return self._JSON_URL_SRC

    @property
    def JSON_URL_DST():
        return self._JSON_URL_DST

Solution

  • The way this is normally done is:

    JSON_PATH_SRC = "src"
    JSON_PATH_DST = "dst"
    JSON_URL_SRC = "src"
    JSON_URL_DST = "dst"
    
    
    pathPacket[JSON_PATH_SRC] = "/tmp"
    pathPacket[JSON_PATH_DST] = "/home/user/out"
    urlPacket[JSON_URL_SRC] = "localhost"
    urlPacket[JSON_URL_SRC] = "contoso"
    

    Upper-case to denote "constants" is the way it goes. You'll see this in the standard library, and it's even recommended in PEP8:

    Constants are usually defined on a module level and written in all capital letters with underscores separating words. Examples include MAX_OVERFLOW and TOTAL.

    Python doesn't have true constants, and it seems to have survived without them. If it makes you feel more comfortable wrapping this in a class that uses ABCmeta with properties, go ahead. Indeed, I'm pretty sure abc.ABCmeta doesn't not prevent object initialization. Indeed, if it did, your use of property would not work! property objects belong to the class, but are meant to be accessed from an instance. To me, it just looks like a lot of rigamarole for very little gain.