Search code examples
yamllibyaml

Using libyaml to parse tree like structure


I am a newbie to YAML and I want to parse the following yaml file :

basket :
 size : 10
 type : organic
 fruit1:
  mango : 5
  type : farm-fresh
 fruit2:
  peach : 43
  manufacturer : xyz
 color : brown
 design : netted
 ...

The yaml file will follow the above format, with any random string name and values(string, float, int, etc). I want to store each of these values in a struct, that has key and values as character array.

struct Input {
 char key[100]:
 char value[100];
}; 

There exists an array of the above struct to store the values from the yaml file.

So the data from the yaml files should be stored as:

 //Input[x].key                  //Input[x].value
basket.size                       10
basket.fruit1.mango               5
basket.fruit2.manufacturer        xyz
basket.color                      brown
basket.desgin                     netted

I wrote an application to parse the yaml file, and I get individual nodes/leaves as an string output. So based on above yaml files, I get node values as basket, size, 5, 43, etc. I followed the approach as defined here. This is one of the good resource I found to learn yaml so far.

This approach is not that useful to me, since I do not have any relation between my previous nodes to the leaves and vice versa.

Does libyaml provide a way to maintain this relationship in a tree and then give return in response to a query. I am bound to use libyaml due to the project requirements. But any other suggestions would also be welcome.


Solution

  • The resource you linked describes several ways of parsing YAML. Token-based parsing, opposed to what the tutorial says, is not useful at all unless you are implementing a syntax highlighter. For all other cases, you want to use event-based parsing. So I'll assume you tried to use that.

    Does libyaml provide a way to maintain this relationship in a tree

    Event-based parsing does maintain the tree structure (not sure what exactly you mean by relationship in a tree), you get …Start and …End events for sequences and mappings, which describe the input structure. It is quite straightforward to build a list of struct Input walking over the event stream:

    #include <yaml.h>
    #include <string.h>
    #include <stdio.h>
    #include <stdbool.h>
    #include <assert.h>
    
    struct Input {
      char key[100];
      char value[100];
    };
    
    struct Input gen(const char *key, const char *value) {
      struct Input ret;
      strcpy(ret.key, key);
      strcpy(ret.value, value);
      return ret;
    }
    
    void append_all(yaml_parser_t *p, struct Input **target,
            char cur_key[100], size_t len) {
      yaml_event_t e;
      yaml_parser_parse(p, &e);
      switch (e.type) {
        case YAML_MAPPING_START_EVENT:
          yaml_event_delete(&e);
          yaml_parser_parse(p, &e);
          while (e.type != YAML_MAPPING_END_EVENT) {
            // assume scalar key
            assert(e.type == YAML_SCALAR_EVENT);
            if (len != 0) cur_key[len++] = '.';
            memcpy(cur_key + len, e.data.scalar.value,
                strlen(e.data.scalar.value) + 1);
            const size_t new_len = len + strlen(e.data.scalar.value);
            yaml_event_delete(&e);
            append_all(p, target, cur_key, new_len);
            if (len != 0) --len;
            cur_key[len] = '\0'; // remove key part
            yaml_parser_parse(p, &e);
          }
          break;
        case YAML_SCALAR_EVENT:
          *(*target)++ = gen(cur_key, e.data.scalar.value);
          break;
        default: assert(false);
      }
      yaml_event_delete(&e);
    }
    
    int main(int argc, char *argv[]) {
      yaml_parser_t p;
      yaml_event_t e;
      yaml_parser_initialize(&p);
      FILE *f = fopen("foo.yaml", "r");
      yaml_parser_set_input_file(&p, f);
      // skip stream start and document start
      yaml_parser_parse(&p, &e);
      yaml_event_delete(&e);
      yaml_parser_parse(&p, &e);
      yaml_event_delete(&e);
    
      char cur_key[100] = {'\0'};
      struct Input input[100];
      struct Input *input_end = input;
      append_all(&p, &input_end, cur_key, 0);
    
      // skip document end and stream end
      yaml_parser_parse(&p, &e);
      yaml_event_delete(&e);
      yaml_parser_parse(&p, &e);
      yaml_event_delete(&e);
    
      yaml_parser_delete(&p);
      fclose(f);
    
      // print out input items
      for (struct Input *cur = input; cur < input_end; ++cur) {
        printf("%s = %s\n", cur->key, cur->value);
      }
    }