Search code examples
pythonruby-on-railsyamlpyyaml

How do I read a custom serialized YAML object (written by Rails) with Python?


I am working with a Rails database that contains serialized values in one column. These values should have been regular Hashes, but due to improperly sanitizing the parameters they were stored as HashWithIndifferentAccess or Parameters. For example, one column entry looks like this:

--- !ruby/object:ActionController::Parameters
parameters: !ruby/hash:ActiveSupport::HashWithIndifferentAccess
  windowHeight: 946
  documentHeight: 3679
  scrollTop: 500
permitted: false

I want to read this with Python's yaml implementation, but when I try to do so, I get:

*** yaml.constructor.ConstructorError: could not determine a constructor for the tag '!ruby/object:ActionController::Parameters'
  in "<unicode string>", line 1, column 5:
    --- !ruby/object:ActionController::P ...
        ^

So, for some reason it expects a constructor. But quite obviously, the value itself is just a regular dictionary. How can I still read it?


Solution

  • You can use the add_constructor(loader, node) function of the PyYAML parser, which lets you implement custom constructors for object types that it does not recognize.

    In that constructor, the function loader.construct_pairs(node) can be called to obtain key-value tuples from the original node contents. Using a dictionary comprehension, we can create the original dictionary.

    Since the entry is nested, we have to apply the constructor to both object types.

    A complete example would be the following:

    import yaml
    
    def convert_entry(loader, node):
        return { e[0]: e[1] for e in loader.construct_pairs(node) }
    
    yaml.add_constructor('!ruby/hash:ActiveSupport::HashWithIndifferentAccess', convert_entry)
    yaml.add_constructor('!ruby/object:ActionController::Parameters', convert_entry)
    
    yaml.load(input_string)
    

    This is somehow documented but it's hard to find a lot of examples.