Search code examples
pythonyamlpyyaml

Retrieve bulk data from YAML using Python


I have a yaml file of the form below:

Solution: 
- number of solutions: 1
  number of solutions displayed: 1
- Gap: None
  Status: optimal
  Message: bonmin\x3a Optimal
  Objective:
    objective:
      Value: 0.010981105395
  Variable:
    battery_E[b1,1,1]:
      Value: 0.25
    battery_E[b1,1,2]:
      Value: 0.259912707017
    battery_E[b1,2,1]:
      Value: 0.120758408109
    battery_E[b2,1,1]:
      Value: 0.0899999972181
    battery_E[b2,2,3]:
      Value: 0.198967393893
    windfarm_L[w1,2,3]:
      Value: 1
    windfarm_L[w1,3,1]:
      Value: 1
    windfarm_L[w1,3,2]:
      Value: 1

Using Python27, I would like to import all battery_E values from this YAML file. I know I can iterate over the keys of battery_E dictionary to retrieve them one by one (I am already doing it using PyYAML) but I would like to avoid iterating and do it in one go!


Solution

  • It's not possible "in one go" - there will still be some kind of iteration either way, and that's completely OK.

    However, if the memory is a concern, you can load only values of the keys of interest during YAML loading:

    from __future__ import print_function
    
    import yaml
    
    KEY = 'battery_E'
    
    
    class Loader(yaml.SafeLoader):
        def __init__(self, stream):
            super(Loader, self).__init__(stream)
            self.values = []
    
        def compose_mapping_node(self, anchor):
            start_event = self.get_event()
            tag = start_event.tag
            if tag is None or tag == '!':
                tag = self.resolve(yaml.MappingNode, None, start_event.implicit)
            node = yaml.MappingNode(tag, [],
                                    start_event.start_mark, None,
                                    flow_style=start_event.flow_style)
            if anchor is not None:
                self.anchors[anchor] = node
            while not self.check_event(yaml.MappingEndEvent):
                item_key = self.compose_node(node, None)
                item_value = self.compose_node(node, item_key)
                if (isinstance(item_key, yaml.ScalarNode)
                        and item_key.value.startswith(KEY)
                        and item_key.value[len(KEY)] == '['):
                    self.values.append(self.construct_object(item_value, deep=True))
                else:
                    node.value.append((item_key, item_value))
            end_event = self.get_event()
            node.end_mark = end_event.end_mark
            return node
    
    
    with open('test.yaml') as f:
        loader = Loader(f)
        try:
            loader.get_single_data()
        finally:
            loader.dispose()
    
        print(loader.values)
    

    Note however, that this code does not assume anything about the position of battery_E keys in the tree inside the YAML file - it will just load all of their values.