I found a project done a few years ago found here that does some simple command line parsing. While I really like it's functionality, it does not support parsing special characters, such as <, >, &, etc. I went ahead and attempted to add some functionality to parse these characters specifically by adding some of the same conditions that the existing code used to look for whitespace, escape characters, and quotes:
bool _isQuote(char c) {
if (c == '\"')
return true;
else if (c == '\'')
return true;
return false;
}
bool _isEscape(char c) {
if (c == '\\')
return true;
return false;
}
bool _isWhitespace(char c) {
if (c == ' ')
return true;
else if(c == '\t')
return true;
return false;
}
.
.
.
What I added:
bool _isLeftCarrot(char c) {
if (c == '<')
return true;
return false;
}
bool _isRightCarrot(char c) {
if (c == '>')
return true;
return false;
}
and so on for the rest of the special characters.
I also tried the same approach as the existing code in the parse
method:
std::list<string> parse(const std::string& args) {
std::stringstream ain(args); // iterates over the input string
ain >> std::noskipws; // ensures not to skip whitespace
std::list<std::string> oargs; // list of strings where we will store the tokens
std::stringstream currentArg("");
currentArg >> std::noskipws;
// current state
enum State {
InArg, // scanning the string currently
InArgQuote, // scanning the string that started with a quote currently
OutOfArg // not scanning the string currently
};
State currentState = OutOfArg;
char currentQuoteChar = '\0'; // used to differentiate between ' and "
// ex. "sample'text"
char c;
std::stringstream ss;
std::string s;
// iterate character by character through input string
while(!ain.eof() && (ain >> c)) {
// if current character is a quote
if(_isQuote(c)) {
switch(currentState) {
case OutOfArg:
currentArg.str(std::string());
case InArg:
currentState = InArgQuote;
currentQuoteChar = c;
break;
case InArgQuote:
if (c == currentQuoteChar)
currentState = InArg;
else
currentArg << c;
break;
}
}
// if current character is whitespace
else if (_isWhitespace(c)) {
switch(currentState) {
case InArg:
oargs.push_back(currentArg.str());
currentState = OutOfArg;
break;
case InArgQuote:
currentArg << c;
break;
case OutOfArg:
// nothing
break;
}
}
// if current character is escape character
else if (_isEscape(c)) {
switch(currentState) {
case OutOfArg:
currentArg.str(std::string());
currentState = InArg;
case InArg:
case InArgQuote:
if (ain.eof())
{
currentArg << c;
throw(std::runtime_error("Found Escape Character at end of file."));
}
else {
char c1 = c;
ain >> c;
if (c != '\"')
currentArg << c1;
ain.unget();
ain >> c;
currentArg << c;
}
break;
}
}
What I added in the parse
method:
// if current character is left carrot (<)
else if(_isLeftCarrot(c)) {
// convert from char to string and push onto list
ss << c;
ss >> s;
oargs.push_back(s);
}
// if current character is right carrot (>)
else if(_isRightCarrot(c)) {
ss << c;
ss >> s;
oargs.push_back(s);
}
.
.
.
else {
switch(currentState) {
case InArg:
case InArgQuote:
currentArg << c;
break;
case OutOfArg:
currentArg.str(std::string());
currentArg << c;
currentState = InArg;
break;
}
}
}
if (currentState == InArg) {
oargs.push_back(currentArg.str());
s.clear();
}
else if (currentState == InArgQuote)
throw(std::runtime_error("Starting quote has no ending quote."));
return oargs;
}
parse
will return a list of strings of the tokens.
However, I am running into issues with a specific test case when the special character is attached to the end of the input. For example, the input
foo-bar&
will return this list: [{&},{foo-bar}]
instead of what I want: [{foo-bar},{&}]
I'm struggling to fix this issue. I am new to C++ so any advice along with some explanation would be great help.
When you handle one of your characters, you need to do the same sorts of things that the original code does when it encounters a space. You need to look at the currentState
, then save the current argument if you are in the middle of one (and reset it since you no longer are in one).