Search code examples
pythonyamlpyyaml

PyYAML - Skipping certain keys when dumping a dict to file


I have a YAML file with configuration data for my application, which is dumped to a new file whenever the application is run for debugging purposes. Unfortunately, some keys in the YAML file hold sensitive data and need to be obfuscated or simply excluded from the dumped file.

Example YAML input file:

logging_config:
    level: INFO
    file_path: /path/to/log_file.log
database_access:
    table_to_query: customer_table
    database_api_key: XXX-XXX-XXX  # Sensitive data, exclude from archived file

There are workarounds, of course:

  • Keeping a list of keys with sensitive data and pre-processing dicts before outputting them to YAML
  • Separating sensitive and non-sensitive data in separate configuration files and outputtiing only the latter
  • etc.

But I was hoping that there was a solution similar to implementing a custom Loader reacting to a command like !keep_secret whenever it appears in a dict value, as it would keep my configuration files more readable.


Solution

  • You can use a custom representer. Here's a basic example:

    import yaml
    
    class SensitiveText:
      def __init__(self, content):
        self.content = content
    
      def __repr__(self):
        return self.content
    
      def __str__(self):
        return self.content
    
    def sensitive_text_remover(dumper, data):
      return dumper.represent_scalar("tag:yaml.org,2002:null", "")
    
    yaml.add_representer(SensitiveText, sensitive_text_remover)
    
    data = {
      "logging_config": {
        "level": "INFO", 
        "file_path": "/path/to/log_file.log"
      },
      "database_access": {
        "table_to_query": "customer_table",
        "database_api_key": SensitiveText("XXX-XXX-XXX")
      }
    }
    
    print(yaml.dump(data))
    

    This prints:

    database_access:
      database_api_key:
      table_to_query: customer_table
    logging_config:
      file_path: /path/to/log_file.log
      level: INFO
    

    You can of course have a class for the database_access instead with a representer that removes the database_api_key altogether.