I am working on a lexer. I have a Token
struct, which looks like this:
struct Token {
enum class Type { ... };
Type type;
std::string_view lexeme;
}
The Token
's lexeme
is just a view to a small piece of the full source code (which, by the way, is also std::string_view
).
The problem is that I need to re-map special characters (for instance, '\n'
). Storing them as-is isn't a nice solution.
I've tried replacing lexeme
's type with std::variant<std::string, std::string_view>
, but it has quickly become spaghetti code, as every time I want to read the lexeme (for example, to check if the type is Bool
and lexeme is "true"
) it's a big pain.
Storing lexeme
as an owning string won't solve the problem.
By the way, I use C++20; maybe there is a nice solution for it?
std::string
Firstly, a std::string
could be used in a Token
just as well as a std::string_view
. This might not be as costly as you think, because std::string
in all C++ standard libraries has SSOs (small string optimizations).
This means that short tokens like "const"
wouldn't be allocated on the heap; the characters would be stored directly inside the container. Before bothering with std::string_view
and std::variant
, you might want to measure whether allocations are even being a performance issue. Otherwise, this is a case of premature optimization.
std::variant
...User @Homer512 has provided a solid solution already. Rather than using the std::variant
directly, you could create a wrapper around it which provides a string-like interface for both std::string
and std::string_view
.
This is easy to do, because the name and meaning of most member functions is identical for both classes. That also makes them easy to use through std::visit
.
struct MaybeOwningString
{
using variant_type = std::variant<std::string, std::string_view>;
using size_type = std::string_view::size_type;
variant_type v;
// main member function which grants access to either alternative as a view
std::string_view view() const noexcept {
return std::visit([](const auto& str) -> std::string_view {
return str;
}, v);
}
// various helper functions which expose commonly used member functions
bool empty() const noexcept {
// helper functions can be implemented with std::visit, but this is verbose
return std::visit([](const auto& str) {
return str.empty();
}, v);
}
size_type size() const noexcept {
// helper functions can also be implemented by using view()
return view().size();
}
// ...
};