MongoC++ driver BSON construction: stream-based vs. string parsing based. Which one has better performance?

The MongoDB C++ driver allows two ways (among others) of creating BSON objects.

Based in stream:

auto builder = bsoncxx::builder::stream::document{};
bsoncxx::document::value doc_value = builder
  << "name" << "MongoDB"
  << "type" << "database"
  << "count" << 1
  << "versions" << bsoncxx::builder::stream::open_array
    << "v3.2" << "v3.0" << "v2.6"
  << close_array
  << "info" << bsoncxx::builder::stream::open_document
    << "x" << 203
    << "y" << 102
  << bsoncxx::builder::stream::close_document
  << bsoncxx::builder::stream::finalize;

Based in parsing a JSON string:

std::string doc = "{ "
  "\"name\" : \"MongoDB\","
  "\"type\" : \"database\","
  "\"count\" : 1,"
  "\"versions\": [ \"v3.2\", \"v3.0\", \"v2.6\" ],"
  "\"info\" : {"
    "\"x\" : 203,"
    "\"y\" : 102"
  "}"
"}";
bsoncxx::document::value bsoncxx::from_json(doc);

I would like to know which one is the most convenient from the point of view of performance. I tend to think that the number of function calls involved by the stream alternative "under the hood" will be worse than procesing the JSON string but it could be the other way around or be equal.

I have tried to find some information about this in the MongoDB C++ driver documentation with no luck. Any information is really welcomed... thanks in advance!

Solution

I did some benchmarking at the end. I'm sharing my results in the case they can be useful for others. Driver veresion is 3.4.0.

This is the stream based version:

#include <iostream>

#include <bsoncxx/builder/stream/document.hpp>
#include <bsoncxx/json.hpp>

#include <mongocxx/client.hpp>
#include <mongocxx/instance.hpp>


int main(int, char**) {
    mongocxx::instance inst{};
    mongocxx::client conn{mongocxx::uri{}};

    for (unsigned int ix = 0; ix < 10000000 ; ++ix) {
       auto builder = bsoncxx::builder::stream::document{};
       bsoncxx::document::value doc_value = builder
      << "name" << "MongoDB"
      << "type" << "database"
      << "count" << 1
      << "versions" << bsoncxx::builder::stream::open_array
        << "v3.2" << "v3.0" << "v2.6"
      << bsoncxx::builder::stream::close_array
      << "info" << bsoncxx::builder::stream::open_document
        << "x" << 203
        << "y" << 102
      << bsoncxx::builder::stream::close_document
          << bsoncxx::builder::stream::finalize;
    }
}

This is the text parsing based version:

#include <iostream>

#include <bsoncxx/builder/stream/document.hpp>
#include <bsoncxx/json.hpp>

#include <mongocxx/client.hpp>
#include <mongocxx/instance.hpp>


int main(int, char**) {
    mongocxx::instance inst{};
    mongocxx::client conn{mongocxx::uri{}};

    for (unsigned int ix = 0; ix < 10000000 ; ++ix) {
        std::string doc = "{ "
      "\"name\" : \"MongoDB\","
      "\"type\" : \"database\","
      "\"count\" : 1,"
      "\"versions\": [ \"v3.2\", \"v3.0\", \"v2.6\" ],"
      "\"info\" : {"
        "\"x\" : 203,"
        "\"y\" : 102"
      "}"
    "}";
       bsoncxx::document::value doc_value = bsoncxx::from_json(doc);
    }
}

As you see, the structure of the program and the number of iterations (10,000,000) is the same in both cases.

Compiled using:

c++ --std=c++11 test-stream.cpp -o test-stream $(pkg-config --cflags --libs libmongocxx)
c++ --std=c++11 test-textparsing.cpp -o test-textparsing $(pkg-config --cflags --libs libmongocxx)

The results with test-stream (three times):

$ time ./test-stream ; time ./test-stream ; time ./test-stream 

real    0m16,454s
user    0m16,200s
sys 0m0,084s

real    0m17,034s
user    0m16,900s
sys 0m0,012s

real    0m18,812s
user    0m18,708s
sys 0m0,036s

The results with test-textparsing (also three times):

$ time ./test-textparsing ; time ./test-textparsing ; time ./test-textparsing 

real    0m53,678s
user    0m53,576s
sys 0m0,024s

real    1m0,203s
user    0m59,788s
sys 0m0,116s

real    0m57,259s
user    0m56,824s
sys 0m0,200s

Conclusion: the stream based strategy outperforms text-based by large.

A peer check of the experiment would be great to confirm results ;)

EDIT: I have added a test case based in the basic builder:

#include <iostream>

#include <bsoncxx/builder/stream/document.hpp>
#include <bsoncxx/json.hpp>

#include <mongocxx/client.hpp>
#include <mongocxx/instance.hpp>

using bsoncxx::builder::basic::kvp;

int main(int, char**) {
    mongocxx::instance inst{};
    mongocxx::client conn{mongocxx::uri{}};

    for (unsigned int ix = 0; ix < 10000000 ; ++ix) {
       bsoncxx::builder::basic::document basic_builder{};
       basic_builder.append(kvp("name", "MongoDB"));
       basic_builder.append(kvp("type", "database"));
       basic_builder.append(kvp("count", 1));

       bsoncxx::builder::basic::array array_builder{};
       array_builder.append("v3.2");
       array_builder.append("v3.0");
       array_builder.append("v2.6");
       basic_builder.append(kvp("versions", array_builder.extract()));  

       bsoncxx::builder::basic::document object_builder{};
       object_builder.append(kvp("x", 203));
       object_builder.append(kvp("y", 102));
       basic_builder.append(kvp("info", object_builder.extract()));  

       bsoncxx::document::value doc_value = basic_builder.extract();
    }
}

compiled this way:

c++ --std=c++11 test-basic.cpp -o test-basic $(pkg-config --cflags --libs libmongocxx)

I have run again the tests with these results:

basic
-----

real    0m20,725s
user    0m20,656s
sys 0m0,004s

real    0m20,651s
user    0m20,620s
sys 0m0,008s

real    0m20,102s
user    0m20,088s
sys 0m0,000s

stream
------

real    0m11,841s
user    0m11,780s
sys 0m0,024s

real    0m11,967s
user    0m11,932s
sys 0m0,008s

real    0m11,634s
user    0m11,616s
sys 0m0,008s

textparsing
-----------

real    0m37,209s
user    0m37,184s
sys 0m0,004s

real    0m36,336s
user    0m36,208s
sys 0m0,028s

real    0m35,840s
user    0m35,648s
sys 0m0,048s

Conclusions:

Gold medal: stream-based approach
Silver medal: basic builder approach (times increases 81.8% compared to stream-based)
Bronze medal: text parsing approach (times increases 227.7% compared to stream-based)

I'd have bet before starting the experiment that basic build will win, but it was stream-based at the end. Maybe there is something woring on my test-basic.cpp code? Or the result makes sense?