I'm new to software development. Now, I'm planning to design a small but useful software to help people recite and review the English words.
As a long-time linux user, I found that user-oriented applications save user configuration in various ways, like XML, plain text, SQLite, etc. When I become a developer, the most important thing I suppose, is to choose some proper approach.
So here is my question, how do most applications preserve their data? More specifically, could you please list some sort of commonly used methods of storing data? The data above refers to the one which helps the application remember the thing and get back to the previous state when it starts up next time (Maybe that's what a configuration file all about). BTW, I'm curious about the plain text style configuration, it looks like this:
property1 = value1
property2 = value2
...
I wonder how the programmers implement the whole scheme, should they use regex ,or there're 3rd-party libraries to invoke, or they just write code to parse it directly?
As for my little project, it has two kinds of persistent data, one for user-friendly configuration, the other serves as a database which stores English words with related information.
Thanks in advance for your patience! :)
It really depends upon the form of the data, the approximate size of the data, the performance requirements, the facility to pre- or post- process the data, etc.
Also, you may want (or not) to keep your data in some textual format. The main advantage of textual formats (including XML, JSON, YAML, ...) is that it is portable and easy to look at by the developer. Also, you can manage it with tools for source code (e.g. version control systems like git
). The disadvantage is that it takes more space on disk, and it takes more time to parse & generate it (however, XML, JSON, YAML formats have numerous libraries supporting them). And textual data is hard to access directly: you basically need to read all of it to use it.
You could also consider that problem as a serialization or persistency issue. You could consider using formats like XDR, ASN1, or your own, or libraries like s11n. You may be interested in having a binary but portable format (e.g. the file would be readable on a system with some other processor, endianness, and word size).
If you think of your data as a configuration data (which would be read, not written, by your program), make it textual and have a way to express comments: the sysadmin will be very happy to add comments in configuration files, which your program should ignore.
If you have a big lot of data which should be accessed and updated quickly, a binary format (similar to what is inside RAM) may be preferable. In that case, think of a dump & restore facility producing (& consuming) some textual thing.
You may also think in terms of application checkpointing which use algorithms similar to those for garbage collection.
A question is what is the value of your data... If you believe it should stay even while your application is evolving (new versions, new features) think hard about that. Document well the format of the data.