Search code examples
c++loggingscientific-computing

"Logbook" for scientific simulations


I'm using C++ to perform scientific simulation on some things. At this moment, due to the increasing number of parameters, I found necessary to have a "logbook": a file where all the information about a given simulation is stored (not the output; the parameters that led to that output and the respective git commit).

I've searched and it seems to me that the use of XML should be a good option, since it can easily be parsed using python, mathematica or other analysis software.

I wonder if anyone agrees with this, or has a better option.

Besides, I wonder how can I pick the current commit of git to save it on the logbook.


Solution

  • In general I agree with you:

    • XML is widely deployed, there's tonnes of tools to bring the logbook into shape.
    • It's flexible, you can add additional attributes later without breaking old ``scripts''
    • It's file based, one document, one file, use the filesystem to organise logbook ``pages''
    • It's file based and plain text, tools like find, grep, diff (at a push) can help you in urgent cases
    • It's your own solution, you're free to track any information you need, and if you deem it essential to associate sunlight hours with the parameters, do it.

    That being said, I should add the storage format depends on the typical use case, if you need to find out why every monday after a full moon the optimiser cannot find any solutions, it will be hard (well, harder) to come up with the necessary XPath/XQuery hackery to do that because of the non-normativity of your structure.

    Well all the downsides I can think of:

    • It's verbose, XML documents in my area tend to be more like 20 to 40 GBs whereas the info probably could be represented in more like 500 MB.
    • It's slow (depends on how you use it), RDBMs or even nosql solutions employ techniques like indexing to make reading faster.
    • It's flexible, that's also a downside: If you happen to add two new attributes per day you will end up with nothing but a marked up free text, it will need thorough polishing if you want to import it into structure-focussed systems (SQL, csv, json, ...)
    • It's your own solution, you have to write it and maintain it

    As for the second bit: git describe --always HEAD