@polygenelubricants answer to this question includes a C# regex that is used to split a PascalCase string into separate words, namely:
Regex r = new Regex(
@" (?<=[A-Z])(?=[A-Z][a-z]) # UC before me, UC lc after me
| (?<=[^A-Z])(?=[A-Z]) # Not UC before me, UC after me
| (?<=[A-Za-z])(?=[^A-Za-z]) # Letter before me, non letter after me
",
RegexOptions.IgnorePatternWhitespace
);
I would like to use the same regular expression in C++. However, C++ regular expression syntax does not permit lookbehinds of the form (?<=...)
. Is it possible to make this work anyways?
EDIT: This is clearly not a duplicate. I know C++ doesn't support lookbehinds, I'm asking if the same functionality can be implemented WITHOUT THEM. For reference, here's how to do it with Boost regex, which does support lookbehinds and which I would ideally like to avoid using:
#include <iostream>
#include <boost/algorithm/string/regex.hpp>
#include <boost/regex.hpp>
int main()
{
boost::regex r(
"(?<=[A-Z])(?=[A-Z][a-z])"
"|(?<=[^A-Z])(?=[A-Z])"
"|(?<=[A-Za-z])(?=[^A-Za-z])"
);
std::vector<std::string> input {
"AutomaticTrackingSystem",
"XMLEditor",
"AnXMLAndXSLT2.0Tool"
};
for (auto const &str : input) {
std::vector<std::string> str_split;
boost::algorithm::split_regex(str_split, str, r);
for (auto const &str_ : str_split)
std::cout << str_ << std::endl;
}
}
You can change the regex to not use lookbehind: [A-Z](?=[A-Z][a-z])|[^A-Z](?=[A-Z])|[A-Za-z](?=[^A-Za-z])
.
In the end the original regex was looking for the beginning of the new word, so it had to look behind for the end of the previous word. But we can look for the end of a word and look ahead for the beginning of the next word. Then we only have to "move" the position by +1
.
const std::sregex_iterator End;
// the code doesn't handle correctly "",
// handle as a special case
std::string str = "ThisIsAPascalStringX";
std::regex rx("[A-Z](?=[A-Z][a-z])|[^A-Z](?=[A-Z])|[A-Za-z](?=[^A-Za-z])");
std::vector<std::string> pieces;
size_t lastStartPosition = 0;
for (auto i(std::sregex_iterator(str.begin(), str.end(), rx)); i != End; ++i)
{
size_t startPosition = i->position() + 1;
pieces.push_back(str.substr(lastStartPosition, startPosition - lastStartPosition));
lastStartPosition = startPosition;
}
pieces.push_back(str.substr(lastStartPosition));
std::cout << "<-- start" << std::endl;
for (auto& s : pieces)
{
std::cout << s << std::endl;
}
std::cout << "<-- end" << std::endl;