Search code examples
c++mongodbmongo-cxx-driver

MongoC++ driver BSON construction: stream-based vs. string parsing based. Which one has better performance?


The MongoDB C++ driver allows two ways (among others) of creating BSON objects.

Based in stream:

auto builder = bsoncxx::builder::stream::document{};
bsoncxx::document::value doc_value = builder
  << "name" << "MongoDB"
  << "type" << "database"
  << "count" << 1
  << "versions" << bsoncxx::builder::stream::open_array
    << "v3.2" << "v3.0" << "v2.6"
  << close_array
  << "info" << bsoncxx::builder::stream::open_document
    << "x" << 203
    << "y" << 102
  << bsoncxx::builder::stream::close_document
  << bsoncxx::builder::stream::finalize;

Based in parsing a JSON string:

std::string doc = "{ "
  "\"name\" : \"MongoDB\","
  "\"type\" : \"database\","
  "\"count\" : 1,"
  "\"versions\": [ \"v3.2\", \"v3.0\", \"v2.6\" ],"
  "\"info\" : {"
    "\"x\" : 203,"
    "\"y\" : 102"
  "}"
"}";
bsoncxx::document::value bsoncxx::from_json(doc);

I would like to know which one is the most convenient from the point of view of performance. I tend to think that the number of function calls involved by the stream alternative "under the hood" will be worse than procesing the JSON string but it could be the other way around or be equal.

I have tried to find some information about this in the MongoDB C++ driver documentation with no luck. Any information is really welcomed... thanks in advance!


Solution

  • I did some benchmarking at the end. I'm sharing my results in the case they can be useful for others. Driver veresion is 3.4.0.

    This is the stream based version:

    #include <iostream>
    
    #include <bsoncxx/builder/stream/document.hpp>
    #include <bsoncxx/json.hpp>
    
    #include <mongocxx/client.hpp>
    #include <mongocxx/instance.hpp>
    
    
    int main(int, char**) {
        mongocxx::instance inst{};
        mongocxx::client conn{mongocxx::uri{}};
    
        for (unsigned int ix = 0; ix < 10000000 ; ++ix) {
           auto builder = bsoncxx::builder::stream::document{};
           bsoncxx::document::value doc_value = builder
          << "name" << "MongoDB"
          << "type" << "database"
          << "count" << 1
          << "versions" << bsoncxx::builder::stream::open_array
            << "v3.2" << "v3.0" << "v2.6"
          << bsoncxx::builder::stream::close_array
          << "info" << bsoncxx::builder::stream::open_document
            << "x" << 203
            << "y" << 102
          << bsoncxx::builder::stream::close_document
              << bsoncxx::builder::stream::finalize;
        }
    }
    

    This is the text parsing based version:

    #include <iostream>
    
    #include <bsoncxx/builder/stream/document.hpp>
    #include <bsoncxx/json.hpp>
    
    #include <mongocxx/client.hpp>
    #include <mongocxx/instance.hpp>
    
    
    int main(int, char**) {
        mongocxx::instance inst{};
        mongocxx::client conn{mongocxx::uri{}};
    
        for (unsigned int ix = 0; ix < 10000000 ; ++ix) {
            std::string doc = "{ "
          "\"name\" : \"MongoDB\","
          "\"type\" : \"database\","
          "\"count\" : 1,"
          "\"versions\": [ \"v3.2\", \"v3.0\", \"v2.6\" ],"
          "\"info\" : {"
            "\"x\" : 203,"
            "\"y\" : 102"
          "}"
        "}";
           bsoncxx::document::value doc_value = bsoncxx::from_json(doc);
        }
    }
    

    As you see, the structure of the program and the number of iterations (10,000,000) is the same in both cases.

    Compiled using:

    c++ --std=c++11 test-stream.cpp -o test-stream $(pkg-config --cflags --libs libmongocxx)
    c++ --std=c++11 test-textparsing.cpp -o test-textparsing $(pkg-config --cflags --libs libmongocxx)
    

    The results with test-stream (three times):

    $ time ./test-stream ; time ./test-stream ; time ./test-stream 
    
    real    0m16,454s
    user    0m16,200s
    sys 0m0,084s
    
    real    0m17,034s
    user    0m16,900s
    sys 0m0,012s
    
    real    0m18,812s
    user    0m18,708s
    sys 0m0,036s
    

    The results with test-textparsing (also three times):

    $ time ./test-textparsing ; time ./test-textparsing ; time ./test-textparsing 
    
    real    0m53,678s
    user    0m53,576s
    sys 0m0,024s
    
    real    1m0,203s
    user    0m59,788s
    sys 0m0,116s
    
    real    0m57,259s
    user    0m56,824s
    sys 0m0,200s
    

    Conclusion: the stream based strategy outperforms text-based by large.

    A peer check of the experiment would be great to confirm results ;)

    EDIT: I have added a test case based in the basic builder:

    #include <iostream>
    
    #include <bsoncxx/builder/stream/document.hpp>
    #include <bsoncxx/json.hpp>
    
    #include <mongocxx/client.hpp>
    #include <mongocxx/instance.hpp>
    
    using bsoncxx::builder::basic::kvp;
    
    int main(int, char**) {
        mongocxx::instance inst{};
        mongocxx::client conn{mongocxx::uri{}};
    
        for (unsigned int ix = 0; ix < 10000000 ; ++ix) {
           bsoncxx::builder::basic::document basic_builder{};
           basic_builder.append(kvp("name", "MongoDB"));
           basic_builder.append(kvp("type", "database"));
           basic_builder.append(kvp("count", 1));
    
           bsoncxx::builder::basic::array array_builder{};
           array_builder.append("v3.2");
           array_builder.append("v3.0");
           array_builder.append("v2.6");
           basic_builder.append(kvp("versions", array_builder.extract()));  
    
           bsoncxx::builder::basic::document object_builder{};
           object_builder.append(kvp("x", 203));
           object_builder.append(kvp("y", 102));
           basic_builder.append(kvp("info", object_builder.extract()));  
    
           bsoncxx::document::value doc_value = basic_builder.extract();
        }
    }
    

    compiled this way:

    c++ --std=c++11 test-basic.cpp -o test-basic $(pkg-config --cflags --libs libmongocxx)
    

    I have run again the tests with these results:

    basic
    -----
    
    real    0m20,725s
    user    0m20,656s
    sys 0m0,004s
    
    real    0m20,651s
    user    0m20,620s
    sys 0m0,008s
    
    real    0m20,102s
    user    0m20,088s
    sys 0m0,000s
    
    stream
    ------
    
    real    0m11,841s
    user    0m11,780s
    sys 0m0,024s
    
    real    0m11,967s
    user    0m11,932s
    sys 0m0,008s
    
    real    0m11,634s
    user    0m11,616s
    sys 0m0,008s
    
    textparsing
    -----------
    
    real    0m37,209s
    user    0m37,184s
    sys 0m0,004s
    
    real    0m36,336s
    user    0m36,208s
    sys 0m0,028s
    
    real    0m35,840s
    user    0m35,648s
    sys 0m0,048s
    

    Conclusions:

    • Gold medal: stream-based approach
    • Silver medal: basic builder approach (times increases 81.8% compared to stream-based)
    • Bronze medal: text parsing approach (times increases 227.7% compared to stream-based)

    I'd have bet before starting the experiment that basic build will win, but it was stream-based at the end. Maybe there is something woring on my test-basic.cpp code? Or the result makes sense?