Search code examples
unicodeutf-8filesystemscross-platformfile-format

How to store metadata for a UTF-8 text file cross-platform?


I'm looking for a cross-platform way of storing small amounts of metadata for UTF-8 text files. Things like current selection and cursor position. I know about filesystem-specific solutions like using extended attributes on Linux and have read up on various solutions for MacOS here, but I'm wondering whether there is some "most accepted, lowest common denominator" way across platforms.

Is it possible to end a UTF-8 file with a special marker that many plain text editors recognize? If not, and I have to store the metadata in another file, is there a best practice for the file format and how to name this file? Or, is there a file format using UTF-8 like .rtf that allows me to store metadata and is handled just fine by most platforms?

I'm trying to decide on the best way for an application running on Windows, Linux, and MacOS, and what I've found so far is very platform and filesystem specific.


Solution

  • From the filesystem standpoint, extended attributes (EA) on Linux and macOS and either EA or alternate streams (ADS) on Windows are the best option for compatibility. The problem with them is that EA are not supported well, and also when files are copied, EAs are usually not copied (on Windows - I can't say for other platforms). ADS are better treated on Windows, but they are not cross-platform. It makes more sense to use ADS on Windows and EA on Linux/macOS.

    As for external files, their name and format are completely up to you. Any Type-Size-Value sequence would do. I've been using custom sequences for years, and recently, I decided to use Protocol Buffers (Protobuf) for one of our projects where the data should be potentially readable by third-parties. Protobuf lets you extend the format in the future, should you need it, and generates code for you (you only describe the data structure).