Search code examples
c++json

Simple JSON string escape for C++?


I'm having a very simple program that outputs simple JSON string that I manually concatenate together and output through the std::cout stream (the output really is that simple) but I have strings that could contain double-quotes, curly-braces and other characters that could break the JSON string. So I need a library (or a function more accurately) to escape strings accordingly to the JSON standard, as lightweight as possible, nothing more, nothing less.

I found a few libraries that are used to encode whole objects into JSON but having in mind my program is 900 line cpp file, I rather want to not rely on a library that is few times bigger then my program just to achieve something as simple as this.


Solution

  • Caveat

    Whatever solution you take, keep in mind that the JSON standard requires that you escape all control characters. This seems to be a common misconception. Many developers get that wrong.

    All control characters means everything from '\x00' to '\x1f', not just those with a short representation such as '\x0a' (also known as '\n'). For example, you must escape the '\x02' character as \u0002.

    See also: ECMA-404 - The JSON data interchange syntax, 2nd edition, December 2017, Page 4

    Simple solution

    If you know for sure that your input string is UTF-8 encoded, you can keep things simple.

    Since JSON allows you to escape everything via \uXXXX, even " and \, a simple solution is:

    #include <sstream>
    #include <iomanip>
    
    std::string escape_json(const std::string &s) {
        std::ostringstream o;
        for (auto c = s.cbegin(); c != s.cend(); c++) {
            if (*c == '"' || *c == '\\' || ('\x00' <= *c && *c <= '\x1f')) {
                o << "\\u"
                  << std::hex << std::setw(4) << std::setfill('0') << static_cast<int>(*c);
            } else {
                o << *c;
            }
        }
        return o.str();
    }
    

    Shortest representation

    For the shortest representation you may use JSON shortcuts, such as \" instead of \u0022. The following function produces the shortest JSON representation of a UTF-8 encoded string s:

    #include <sstream>
    #include <iomanip>
    
    std::string escape_json(const std::string &s) {
        std::ostringstream o;
        for (auto c = s.cbegin(); c != s.cend(); c++) {
            switch (*c) {
            case '"': o << "\\\""; break;
            case '\\': o << "\\\\"; break;
            case '\b': o << "\\b"; break;
            case '\f': o << "\\f"; break;
            case '\n': o << "\\n"; break;
            case '\r': o << "\\r"; break;
            case '\t': o << "\\t"; break;
            default:
                if ('\x00' <= *c && *c <= '\x1f') {
                    o << "\\u"
                      << std::hex << std::setw(4) << std::setfill('0') << static_cast<int>(*c);
                } else {
                    o << *c;
                }
            }
        }
        return o.str();
    }
    

    Pure switch statement

    It is also possible to get along with a pure switch statement, that is, without if and <iomanip>. While this is quite cumbersome, it may be preferable from a "security by simplicity and purity" point of view:

    #include <sstream>
    
    std::string escape_json(const std::string &s) {
        std::ostringstream o;
        for (auto c = s.cbegin(); c != s.cend(); c++) {
            switch (*c) {
            case '\x00': o << "\\u0000"; break;
            case '\x01': o << "\\u0001"; break;
            ...
            case '\x0a': o << "\\n"; break;
            ...
            case '\x1f': o << "\\u001f"; break;
            case '\x22': o << "\\\""; break;
            case '\x5c': o << "\\\\"; break;
            default: o << *c;
            }
        }
        return o.str();
    }
    

    Using a library

    You might want to have a look at https://github.com/nlohmann/json, which is an efficient header-only C++ library (MIT License) that seems to be very well-tested.

    You can either call their escape_string() method directly (Note that this is a bit tricky, see comment below by Lukas Salich), or you can take their implementation of escape_string() as a starting point for your own implementation:

    https://github.com/nlohmann/json/blob/ec7a1d834773f9fee90d8ae908a0c9933c5646fc/src/json.hpp#L4604-L4697