I am trying to take an S-expression that contains different variables and tokenize them according to their type. I am pretty new to regex so I am not entirely sure why this only matches parentheses and the else-condition for a variable's type. If you have any idea why my regexes aren't matching the tokens, please let me know!
#include <string>
#include <regex>
#include <iostream>
#define print(var) std::cout << var << std::endl
std::string INT_REGEX = "\b[0-9]{1,3}[0-9]{1,3}\b",
DOUBLE_REGEX = "\b[0-9]{1,3}.[0-9]{1,3}\b",
BOOLEAN_REGEX = "^(true|false)$";
bool matchRegex(std::string pattern, std::string inputString) {
std::regex expression(pattern);
return std::regex_match(inputString, expression);
}
void detectTokenType(std::string strToken) {
if (strToken == "(" | strToken == ")")
print("Parenthesis");
else if (matchRegex(INT_REGEX, strToken))
print("Integer");
else if (matchRegex(DOUBLE_REGEX, strToken))
print("Double");
else if (matchRegex(DOUBLE_REGEX, strToken))
print("Boolean");
else
print("Variable name or string");
}
void tokenize(std::string listData) {
std::vector<char> tokenBuffer;
for (int i = 0; i < listData.length(); i++) {
char currChar = listData[i];
if (i == listData.length() - 1) {
tokenBuffer.push_back(currChar);
std::string strToken(tokenBuffer.begin(), tokenBuffer.end());
detectTokenType(strToken);
}
else if (currChar != ' ') {
tokenBuffer.push_back(currChar);
}
else {
std::string strToken(tokenBuffer.begin(), tokenBuffer.end());
tokenBuffer.clear();
detectTokenType(strToken);
}
}
}
int main() {
std::string codeSnippet = "( 2 3.0 true )";
tokenize(codeSnippet);
return 0;
}
In your regex strings, you are using \b
which is not a word boundary. Instead, you need \\b
. Similarly, the .
has a special meaning (it's a wildcard that matches any character). If you want to match a literal .
, you need \\.
.
Also, you are checking for at least 2 digits in the INT_REGEX
which is unnecessary:
std::string INT_REGEX = "\\b[0-9]{1,3}\\b",
DOUBLE_REGEX = "\\b[0-9]{1,3}\\.[0-9]{1,3}\\b",
BOOLEAN_REGEX = "^(true|false)$";
Also, you are checking DOUBLE_REGEX
for the Boolean
case as well, so you need to fix that.
Here's a demo.