I'm using FlatBuffers (C++) to store metadata information about a file. This includes EXIF, IPTC, GPS and various other metadata values.
In my current schema, I have a fairly normalized definition whereby each of the groups listed above has its own table. The root table just includes properties for each sub-table.
Basic Example:
table GPSProperties {
latitude:double;
longitude:double;
}
table ContactProperties {
name:string;
email:string;
}
table EXIFProperties {
camera:string;
lens:string;
gps:GPSProperties;
}
table IPTCProperties {
city:string;
country:string;
contact:ContactProperties;
}
table Registry {
exifProperties:EXIFProperties;
iptcProperties:IPTCProperties;
}
root_type Registry;
This works, but the nesting restrictions when building a buffer are starting to make the code pretty messy. As well, breaking up the properties into separate tables is only for clarity in the schema.
I'm considering just "flattening" the entire schema into a single table but I was wondering if there are any performance or memory implications of doing that. This single table could have a few hundred fields, though most would be empty.
Proposal:
table Registry {
exif_camera:string;
exif_lens:string;
exif_gps_latitude:double;
exif_gps_longitude:double;
iptc_city:string;
iptc_country:string;
iptc_contact_name:string;
iptc_contact_email:string;
}
root_type Registry;
Since properties that are either not set or set to their default value don't take up any memory, I'm inclined to believe that a flattened schema might not be a problem. But I'm not certain.
(Note that performance is my primary concern, followed closely by memory usage. The normalized schema is performing excellently, but I think a flattened schema would really help me clean up my codebase.)
Since most of your data is strings, the size and speed of both of these designs will be very similar, so you should probably choose based on what works better for you from a software engineering perspective.
That said, the flat version will likely be slightly more efficient in size (less vtables) and certainly will be faster to access (though again, that is marginal given that it is mostly string data).
The only way in which the flat version could be less efficient is if you were to store a lot of them in one buffer, where which fields are set varies wildly between each table. Then the non-flat version may generate more vtable sharing.
In the non-flat version, tables like GPSProperties
could be a struct
if the fields are unlikely to ever change, which would be more efficient.