The following codes try to insert new documents into mongoDB through go.mongodb.org/mongo-driver
data := "this is test string blablablablablablabla"
type Doc struct {
Version int "json:version, bson:version"
Data string "json:data, bson:data"
}
dd := Doc{Version: 21, Data: data}
dObj, _ := json.Marshal(dd)
queryFilter := bson.M{"version": 1}
update1 := bson.M{"$set": bson.M{"version": 1, "data": json.RawMessage(dObj)}}
// insert data with json.RawMessage
_, err := db.Mongo("test").Collection("test_doc1").UpdateOne(context.Background(), queryFilter, update1, options.Update().SetUpsert(true))
if err != nil {
fmt.Println("failed to insert doc1")
}
update2 := bson.M{"$set": bson.M{"version": 1, "data": (dObj)}}
// insert data without json.RawMessage
_, err = db.Mongo("test").Collection("test_doc2").UpdateOne(context.Background(), queryFilter, update2, options.Update().SetUpsert(true))
if err != nil {
fmt.Println("failed to insert doc2")
}
The content of test_doc1
is "data": json.RawMessage(dObj)
, whereas the content of test_doc2
is "data": (dObj)
.
The document content as below
db.test_doc1.find()
{ "_id" : ObjectId("5da164a950d625a5b2e5d23e"), "version" : 1, "data" : [ 123, 34, 86, 101, 114, 115, 105, 111, 110, 34, 58, 50, 49, 44, 34, 68, 97, 116, 97, 34, 58, 34, 116, 104, 105, 115, 32, 105, 115, 32, 116, 101, 115, 116, 32, 115, 116, 114, 105, 110, 103, 32, 98, 108, 97, 98, 108, 97, 98, 108, 97, 98, 108, 97, 98, 108, 97, 98, 108, 97, 98, 108, 97, 34, 125 ] }
db.test_doc2.find()
{ "_id" : ObjectId("5da164a950d625a5b2e5d249"), "version" : 1, "data" : BinData(0,"eyJWZXJzaW9uIjoyMSwiRGF0YSI6InRoaXMgaXMgdGVzdCBzdHJpbmcgYmxhYmxhYmxhYmxhYmxhYmxhYmxhIn0=") }
After check the size of the above two documents
Object.bsonsize(db.test_doc2.findOne())
111
Object.bsonsize(db.test_doc1.findOne())
556
The size of test_doc1
is more larger than test_doc2
. Why?
Per bson doc
Array - The document for an array is a normal BSON document with integer values for the keys, starting with 0 and continuing sequentially. For example, the array ['red', 'blue'] would be encoded as the document {'0': 'red', '1': 'blue'}. The keys must be in ascending numerical order.
Bson array could occupy more disk space? Am I right?
MongoDB version: 4.0
test_doc1 uses json.RawMessage
which is essentially []byte
so it gets stored as an array of integers which represent the string (raw representation of the document).
test_doc2 is storing the data as binary data which is a more compact form.
The Go Mongo Driver uses the WriteBinaryWithSubtype
method for the json encoded data but uses WriteArray
for the RawMessage
.
The difference is in the data type being used on the mongo side to store these data. One is storing the byte slice as an array of integers, another is storing data as binary with a sub type. The binary form takes less space compared to the integers.
Digging deeper, I noticed the Go driver uses a registry to determine how it should encode a value to BSON. There's a method dedicated to byte slices.
// ByteSliceEncodeValue is the ValueEncoderFunc for []byte.
func (dve DefaultValueEncoders) ByteSliceEncodeValue(ec EncodeContext, vw bsonrw.ValueWriter, val reflect.Value) error {
This method uses the WriteBinary()
method to encode byte slices as binary data.
Where as, if there's a custom type (even if it's a []byte
underneath), it would be treated as a slice type and trigger the "default encoder" for slices.
// SliceEncodeValue is the ValueEncoderFunc for slice types.
func (dve DefaultValueEncoders) SliceEncodeValue(ec EncodeContext, vw bsonrw.ValueWriter, val reflect.Value) error {
This method uses the WriteArray()
method in turn.
Summary: The json.Marshal
call gets use []byte
types directly so they are treated as bson binary type and stored in the compact binary form. json.RawMessage
even though stores the data as []byte
internally is treated as a slice, a slice of integers and thus stored in mongo as an array of integers.