Search code examples
javayamldeserializationsnakeyaml

Build custom SnakeYAML Constructor to deserialize yaml file in a modular way


I would like to parse yaml files like the following using SnakeYAML:

config:
  someBoolean: true
  someString: testing action descriptors
actions:
- print: Hello world
- print: Next action is add
- add:
    left: 25
    right: 17
- print: done

The target type for this document is DocumentRoot:

public class DocumentRoot {
    public Config config;
    public List<Map<String, Object>> actions;
}

public class Config {
    public String someString;
    public boolean someBoolean;
}

So most of the document should be parsed by SnakeYAML directly into Java-Objects like the config-Attribute. However the actions-Attribute should be parsed in a modular way. Consider the following ActionDescriptors:

public interface ActionDescriptor<T> {
    String actionKey();

    Class<T> actionValueType();

    void runAction(T actionValue);
}

public class AddExpression {
    public int left;
    public int right;
}

private static List<ActionDescriptor<?>> createDescriptors() {
    return List.of(new ActionDescriptor<String>() {
        @Override
        public String actionKey() {
            return "print";
        }

        @Override
        public Class<String> actionValueType() {
            return String.class;
        }

        @Override
        public void runAction(String actionValue) {
            System.out.println(actionValue);
        }
    }, new ActionDescriptor<AddExpression>() {
        @Override
        public String actionKey() {
            return "add";
        }

        @Override
        public Class<AddExpression> actionValueType() {
            return AddExpression.class;
        }

        @Override
        public void runAction(AddExpression actionValue) {
            System.out.println("calculated: " + (actionValue.left + actionValue.right));
        }
    });
}

I would now like to use these ActionDescriptors to use the actions attribute in the following way:

public static void main(String[] args) throws IOException {
    List<ActionDescriptor<?>> descriptors = createDescriptors();
    DocumentRoot documentRoot = createYaml(descriptors).loadAs(new FileInputStream("data/input.yaml"),
            DocumentRoot.class);
    Map<String, ActionDescriptor<?>> descriptorMap = descriptors.stream()
            .collect(Collectors.toMap(ActionDescriptor::actionKey, Function.identity()));
    if (documentRoot.config.someBoolean) {
        System.out.println(documentRoot.config.someString);
        for (Map<String, Object> actionMap : documentRoot.actions) {
            for (Entry<String, Object> entry : actionMap.entrySet()) {
                runAction(entry.getValue(), descriptorMap.get(entry.getKey()));
            }
        }
    }
}

private static <T> void runAction(Object actionValue, ActionDescriptor<T> descriptor) {
    Class<T> valueType = descriptor.actionValueType();
    if (valueType.isInstance(actionValue)) {
        descriptor.runAction(valueType.cast(actionValue));
    } else {
        System.out.println("expected '" + valueType + "' but got '" + actionValue.getClass() + "'");
    }
}

Currently I use the following method to create the Yaml instance of SnakeYAML:

private static Yaml createYaml(List<ActionDescriptor<?>> descriptors) {
    Constructor constructor = new Constructor(DocumentRoot.class);
    for (ActionDescriptor<?> descriptor : descriptors) {
        // ???
        constructor.addTypeDescription(new TypeDescription(descriptor.actionValueType()));
    }
    Yaml yaml = new Yaml(constructor);
    yaml.setBeanAccess(BeanAccess.FIELD);
    return yaml;
}

When running the program I get the following output:

testing action descriptors
Hello world
Next action is add
expected 'class animatex.so.AddExpression' but got 'class java.util.LinkedHashMap'
done

But I would like to have the following one:

testing action descriptors
Hello world
Next action is add
calculated: 42
done

Clearly SnakeYAML does not use the desired types for the deserialization of the action values. So I need to somehow tell SnakeYaml at the location ??? that if it deserializes a value in a map entry (whose map is an entry in the list in the attribute actions) then it should use the type descriptor.actionValueType() if the respective key of the map entry is descriptor.actionKey().

I already tried several things using TypeDescriptors, Constructors and Constructs and digged into the Code of SnakeYaml, but I simply do not really understand how it works so I am unable to build a working constructor for this use case.

If it helps, I can also extend the ActionDescriptor interface to provide a TypeDescriptor, Constructor, Construct ...

I would really like to avoid adding tags to the yaml file, but if there is no other solution I might bite that bullet.

My question is: How can I build such a Constructor? Looking forward to your comments and answers :-)


Solution

  • The first step is to avoid nested generics. To do so, we can adjust the class DocumentRoot as follows:

    public class DocumentRoot {
        public Config config;
        public List<ActionMap> actions;
    }
    
    public class ActionMap {
        private final Map<String, Object> actions;
    
        public ActionMap(Map<String, Object> actions) {
            this.actions = actions;
        }
    }
    

    We wrapped the map into an object of type ActionMap. Now we need to tell SnakeYAML how to parse a MappingNode (anything that looks like a map in the yaml file) into an object of type ActionMap. I found a way to extend the class org.yaml.snakeyaml.constructor.Constructor in such a way that this is easily possible:

    public class MyConstructor extends Constructor {
        public MyConstructor(Class<?> rootClass,
                Map<Class<?>, BiFunction<Function<Node, Object>, MappingNode, Object>> mappingNodeConstructors,
                Map<Class<?>, BiFunction<Function<Node, Object>, SequenceNode, Object>> sequenceNodeConstructors) {
            super(rootClass);
            this.yamlClassConstructors.put(NodeId.mapping, new ConstructMapping() {
                @Override
                public Object construct(Node node) {
                    for (Entry<Class<?>, BiFunction<Function<Node, Object>, MappingNode, Object>> entry : mappingNodeConstructors
                            .entrySet()) {
                        if (entry.getKey().isAssignableFrom(node.getType())) {
                            if (node.isTwoStepsConstruction()) {
                                throw new YAMLException("Unexpected 2nd step. Node: " + node);
                            } else {
                                return entry.getValue().apply(MyConstructor.this::constructObject, (MappingNode) node);
                            }
                        }
                    }
                    return super.construct(node);
                }
    
                @Override
                public void construct2ndStep(Node node, Object object) {
                    throw new YAMLException("Unexpected 2nd step. Node: " + node);
                }
            });
            this.yamlClassConstructors.put(NodeId.sequence, new ConstructSequence() {
                @Override
                public Object construct(Node node) {
                    for (Entry<Class<?>, BiFunction<Function<Node, Object>, SequenceNode, Object>> entry : sequenceNodeConstructors
                            .entrySet()) {
                        if (entry.getKey().isAssignableFrom(node.getType())) {
                            if (node.isTwoStepsConstruction()) {
                                throw new YAMLException("Unexpected 2nd step. Node: " + node);
                            } else {
                                return entry.getValue().apply(MyConstructor.this::constructObject, (SequenceNode) node);
                            }
                        }
                    }
                    return super.construct(node);
                }
    
                @Override
                public void construct2ndStep(Node node, Object object) {
                    throw new YAMLException("Unexpected 2nd step. Node: " + node);
                }
            });
        }
    }
    

    Note that we completely ignore SnakeYAML's so-called 2nd step which, to my understanding, is only used for yaml files which use references. Since I don't need this feature I ignored it. Also note that we don't need to handle SequenceNodes for this example, but it might still be useful to have for some people.

    SnakeYAML's parsing works as follows:

    1. Parse the document into Node-Objects
    2. Tag Node-Objects with a target type
    3. Converts Node-Objects to desired target type

    For step three, SnakeYAML uses the construct-method of the ConstructMapping to convert a MappingNode (anything that looks like a map in the yaml file) into its target type. Similarly it uses the construct-method of the SequenceMapping to convert a SequenceNode (anything that looks like a list in the yaml file) into its target type.

    Now we can use an instance of MyConstructor to tell SnakeYAML how to parse a MappingNode into an ActionMap:

    private static Yaml createYaml(List<ActionDescriptor<?>> descriptors) {
        Yaml yaml = new Yaml(createConstructor(descriptors));
        yaml.setBeanAccess(BeanAccess.FIELD);
        return yaml;
    }
    
    private static Constructor createConstructor(List<ActionDescriptor<?>> descriptors) {
        Map<String, ActionDescriptor<?>> descriptorMap = descriptors.stream()
                .collect(Collectors.toMap(ActionDescriptor::actionKey, Function.identity()));
        Constructor result = new MyConstructor(DocumentRoot.class, Map.of(ActionMap.class, (constructor, mnode) -> {
            Map<String, Object> actionMap = new LinkedHashMap<>();
            for (NodeTuple entry : mnode.getValue()) {
                Node actionKeyNode = entry.getKeyNode();
                Node actionValueNode = entry.getValueNode();
    /* (1) */   String actionKey = (String) constructor.apply(actionKeyNode);
    /* (2) */   Class<?> actionValueType = descriptorMap.get(actionKey).actionValueType();
    /* (3) */   actionValueNode.setType(actionValueType);
    /* (4) */   Object actionValue = constructor.apply(actionValueNode);
    /* (5) */   actionMap.put(actionKey, actionValue);
            }
            return new ActionMap(actionMap);
        }), Map.of());
        TypeDescription typeDescription = new TypeDescription(DocumentRoot.class);
        typeDescription.addPropertyParameters("actions", ActionMap.class);
        result.addTypeDescription(typeDescription);
        return result;
    }
    

    Here, we tell MyConstructor that it can convert a MappingNode into an ActionMap by using the given lambda. This lambda iterates all entries of the MappingNode. For each entry it (1) extracts the actionKey, (2) determines the actionValueType based on the actionKey, (3) tags the value Node of the entry with the actionValueType, (4) calls back into SnakeYAML to convert the value Node into the actionValueType and (5) creates a new entry in the actionMap for the determined actionKey and actionValue. Finally it wraps the actionMap into an ActionMap.

    Finally the method createConstructor creates a TypeDescriptor to tell SnakeYAML that the generic type parameter of the actions attribute of the class DocumentRoot is ActionMap. This is necessary due to Java's type erasure.

    I adjusted the code to actually run the actions as follows:

    public static void main(String[] args) throws IOException {
        List<ActionDescriptor<?>> descriptors = createDescriptors();
        DocumentRoot documentRoot = createYaml(descriptors).loadAs(new FileInputStream("data/input.yaml"),
                DocumentRoot.class);
        if (documentRoot.config.someBoolean) {
            System.out.println(documentRoot.config.someString);
            for (ActionMap actionMap : documentRoot.actions) {
                for (ActionDescriptor<?> descriptor : descriptors) {
                    runAction(actionMap, descriptor);
                }
            }
        }
    }
    
    private static <T> void runAction(ActionMap actionMap, ActionDescriptor<T> descriptor) {
        actionMap.getActionValue(descriptor).ifPresent(v -> descriptor.runAction(v));
    }
    

    Where getActionValue is a method in the class ActionMap:

    public <T> Optional<T> getActionValue(ActionDescriptor<T> descriptor) {
        if (actions.containsKey(descriptor.actionKey())) {
            Object actionValue = actions.get(descriptor.actionKey());
            Class<T> valueType = descriptor.actionValueType();
            if (valueType.isInstance(actionValue)) {
                return Optional.of(valueType.cast(actionValue));
            } else {
                throw new RuntimeException("expected '" + valueType + "' but got '" + actionValue.getClass() + "'");
            }
        } else {
            return Optional.empty();
        }
    }
    

    As @flyx pointed out in their answer, this approach implements „poor man's tags“ instead of using the existing tagging feature of yaml and SnakeYAML. Thus before using this approach think about using the existing tagging feature in yaml and SnakeYAML.

    However, this approach is just what I want and I might not be the only one, for example ansible seems to use a similar yaml layout for its task lists. In my actual use case it also makes sense to have multiple actions in a single list entry which is not directly possible with yaml tags.

    In a real world application one probably wants to add better error handling and some class that is more specialized than the type BiFunction<Function<Node, Object>, MappingNode, Object>>. I omitted these refinements to prevent this answer from becoming even longer.