The MongoDB Application FAQ mentions that short field names are a technique that can be used for small documents. This led me to thinking, "what's a small document anyway?"
I'm using pymongo, is there any way I can write some python to scan a collection, and get a feel of the ratio of bytes used for field descriptors vs bytes used for actual field data?
I'm tangentially curious on what the basic byte overhead is per doc, as well.
There is no builtin way to get the ratio of space used for keys in BSON documents versus space used for actual field values. However, the collstats and dbstats commands can give you useful information on collection and database size. Here's how to use them in pymongo:
from pymongo import MongoClient
client = MongoClient()
db = client.test
# print collection statistics
print db.command("collstats", "events")
# print database statistics
print db.command("dbstats")
You could always hack something up to get a pretty good estimate though. If all of your documents in a collection have the same schema, then something like this isn't half bad:
Now d is the proportion of the total data size of the collection which is used to store field names.