I'm having a very simple program that outputs simple JSON string that I manually concatenate together and output through the std::cout stream (the output really is that simple) but I have strings that could contain double-quotes, curly-braces and other characters that could break the JSON string. So I need a library (or a function more accurately) to escape strings accordingly to the JSON standard, as lightweight as possible, nothing more, nothing less.
I found a few libraries that are used to encode whole objects into JSON but having in mind my program is 900 line cpp file, I rather want to not rely on a library that is few times bigger then my program just to achieve something as simple as this.
Caveat
Whatever solution you take, keep in mind that the JSON standard requires that you escape all control characters. This seems to be a common misconception. Many developers get that wrong.
All control characters means everything from '\x00'
to '\x1f'
, not just those with a short representation such as '\x0a'
(also known as '\n'
). For example, you must escape the '\x02'
character as \u0002
.
See also: ECMA-404 - The JSON data interchange syntax, 2nd edition, December 2017, Page 4
Simple solution
If you know for sure that your input string is UTF-8 encoded, you can keep things simple.
Since JSON allows you to escape everything via \uXXXX
, even "
and \
, a simple solution is:
#include <sstream>
#include <iomanip>
std::string escape_json(const std::string &s) {
std::ostringstream o;
for (auto c = s.cbegin(); c != s.cend(); c++) {
if (*c == '"' || *c == '\\' || ('\x00' <= *c && *c <= '\x1f')) {
o << "\\u"
<< std::hex << std::setw(4) << std::setfill('0') << static_cast<int>(*c);
} else {
o << *c;
}
}
return o.str();
}
Shortest representation
For the shortest representation you may use JSON shortcuts, such as \"
instead of \u0022
. The following function produces the shortest JSON representation of a UTF-8 encoded string s
:
#include <sstream>
#include <iomanip>
std::string escape_json(const std::string &s) {
std::ostringstream o;
for (auto c = s.cbegin(); c != s.cend(); c++) {
switch (*c) {
case '"': o << "\\\""; break;
case '\\': o << "\\\\"; break;
case '\b': o << "\\b"; break;
case '\f': o << "\\f"; break;
case '\n': o << "\\n"; break;
case '\r': o << "\\r"; break;
case '\t': o << "\\t"; break;
default:
if ('\x00' <= *c && *c <= '\x1f') {
o << "\\u"
<< std::hex << std::setw(4) << std::setfill('0') << static_cast<int>(*c);
} else {
o << *c;
}
}
}
return o.str();
}
Pure switch statement
It is also possible to get along with a pure switch statement, that is, without if
and <iomanip>
. While this is quite cumbersome, it may be preferable from a "security by simplicity and purity" point of view:
#include <sstream>
std::string escape_json(const std::string &s) {
std::ostringstream o;
for (auto c = s.cbegin(); c != s.cend(); c++) {
switch (*c) {
case '\x00': o << "\\u0000"; break;
case '\x01': o << "\\u0001"; break;
...
case '\x0a': o << "\\n"; break;
...
case '\x1f': o << "\\u001f"; break;
case '\x22': o << "\\\""; break;
case '\x5c': o << "\\\\"; break;
default: o << *c;
}
}
return o.str();
}
Using a library
You might want to have a look at https://github.com/nlohmann/json, which is an efficient header-only C++ library (MIT License) that seems to be very well-tested.
You can either call their escape_string()
method directly (Note that this is a bit tricky, see comment below by Lukas Salich), or you can take their implementation of escape_string()
as a starting point for your own implementation: