Search code examples
data-structuresconfigurationyamlconfiguration-files

YAML : Use mapped list vs array


I am creating a configuration file for my application. To do it, I decided to use YAML for its simplicity and reliability.

I am currently designing a special part of my application: In this part, I have to list and configure all datasets I want to use in a module. To do that I wrote this :

    // Other stuff       
    datasets:
        rate_variation:
            name: Rate variation over time # Optional
            description: Description here # Optional
            type: POINTS_2D
            options:
                REFRESH_TIME: 5 # Time of refresh in second
        frequency_variation:
            name: Frequency variation over time
            description: Description here # Optional
            type: POINTS_2D

But, after some reflection, I have some doubts about it. Because maybe something like this is better :

    datasets:
        -   id: rate_variation
            name: Rate variation over time # Optional
            description: Description here # Optional
            type: POINTS_2D
            options:
                REFRESH_TIME: 5 # Time of refresh in second
        -   id: frequency_variation
            name: Frequency variation over time
            description: Description here # Optional
            type: POINTS_2D

I use the ID to identify each dataset in my scripts (two datasets must have a different id) and generate output files for each of them. But now, I really don't know what is the best solution...

What would you recommend to use? And for what reason?


Solution

  • With the first option, YAML enforces that there are no duplicate IDs. Therefore, an editor supporting YAML may support your user by showing an error in this case. With the second option, you need to check uniqueness in your code and the user only sees the error when loading the syntactically correct YAML into your application.

    However, there are other factors to consider. For example, you may have a preference for the resulting in-memory data structures. If you use standard YAML implementations that deserialize to native data structures (PyYAML, SnakeYAML etc), the YAML structure imposes the type of the in-memory data structure (you can customize by writing custom constructors, but that's not trivial). For example, if you want to ask a dataset object for its ID, that is only directly doable with the second structure – if you use the first structure, you would need to search the parent table for the dataset value you have to get its ID.

    So, final answer is (as always): It depends. Think about what you want to do with it. For simple configuration files, my second argument may be weaker than my first one, but I don't know what exactly you want to do with the data.