Search code examples
jsonxmldatabaseprotocol-buffersbinaryfiles

How should a "project" file be written?


With popular software packages, like Microsoft Word or Photoshop, we often have an option to save our progress as a "project" file and later can open that file to edit our works furthermore. This file often contains all the options and the progress that the user has made (i.e the essay you typed in Word).

So my question is, if I am doing a similar application that requires creating a similar "project" file, how should I go about this? My application is a scientific application, which means it required a lot of (multi-dimension) arrays. I understand there will be a lot of options to do this, but I would like to know the de facto way.

Here are some of the options I have outline out:

  1. XML: Human readable. The size is too big and it's too much work to deal with arrays.
  2. JSON: More popular/modern. Good with array.
  3. Protocol Buffer: It is created by Google. Probably faster.
  4. Database: Probably not a good use case since "project" files are most likely "temporary". Also, working with arrays is not very straight forward.
  5. Creating your own binary format: Might be the most difficult solution for an inexperienced programmer like myself.
  6. ???

I would like to get some advice from you guys. Thank you :).


Solution

  • (Good question. :) Only some thoughts) I'd prefer text format for the main project file. You can make diffs and open and read and modify it easily. Large ascii or binary data can be stored as serialized data in external files or in a database like SQLite from where it can be easily accessed and processed through the application. The main project has links to the external data store. My advice for the main project file is a simple XML format that can easily be transformed to JSON format. A list of key value pairs (dict) is good for the beginning. value can be of basic datatype or be an array or dict. A complicated XML tree is not good. The key name can also help to describe and structure data. So i'd prefer key="rect.4711.pos.x" value="500" and not <rect id="4711"><pos><x>500</x>...</pos>.... Important aspect is that the project data is portable and self-contained, and the user can see the project as a single unit even if it is a directory on the file system, for this purpose supporting some kind of zipped format of project data is good.