Search code examples
pythonjsonpython-3.xconfiguration-files

Best way to store dicts, regex and variables in external file?


For configuration purposes, if I store an "easy" regex in a JSON file and load it into my Python program, it works just fine.

{
    "allow": ["\/word\/.*"],
    "follow": true
},

If I store a more complex regex in a JSON file, the same Python program fails.

{
    "allow": ["dcp\=[0-9]+\&dppp\="],
    "follow": true
},

That's the code that loads my JSON file:

src_json = kw.get('src_json') or 'sources/sample.json'
self.MY_SETTINGS = json.load(open(src_json))

and the error is usually the same, pointing my online searches to the fact, that regular expressions should not be stored in JSON files.

json.decoder.JSONDecodeError: Invalid \escape: line 22 column 38 (char 801)

YAML files seem to have similar limitations, so I shouldn't got down that way I guess.

Now, I've stored my expression inside a dict in a separate file:

mydict = {"allow": "com\/[a-z]+(?:-[a-z]+)*\?skid\="}

and load it from my program file:

exec(compile(source=open('expr.py').read(), filename='expr.py', mode='exec'))

print(mydict)

Which works and would be fine with me - but it looks a bit ... special ... with exec and compile.

Is there any reason not to do it in this way? Is there a better way to store complex data structures and regular expressions in external files which I can open / use in my program code?


Solution

  • First, regular expressions can be stored as JSON, but need to be stored as valid JSON. This is the cause of your JSONDecodeError in the example.

    There are other answers here on SO, that explain how to properly encode/decode regex as valid JSON, such as: Escaping Regex to get Valid JSON

    Now, the other pieces of your question start to go into more best practices and opinions.

    As you've seen, you certainly can declare & use variables from other files:

    test_regex.py

    my_dict = {'allow': 'com\\/[a-z]+(?:-[a-z]+)*\\?skid\\='}
    

    script.py

    from test_regex import mydict
    mydict
    {'allow': 'com\\/[a-z]+(?:-[a-z]+)*\\?skid\\='}
    

    However, this is a pretty different feeling use case. In our JSON example, the information is set in a way that we expect it to be more easily configurable - different JSON files could be used (perhaps for different environment configurations) each with different regex. In this example, we don't assume configurability but instead the test_regex is used for separation of concerns and readability.