I'm working on a state machine which is supposed to extract function calls of the form
/* I am a comment */
//I am a comment
pref("this.is.a.string.which\"can have QUOTES\"", 123456);
where the extracted data would be pref("this.is.a.string.which\"can have QUOTES\"", 123456);
from a file. Currently, to process a 41kb file, this process is taking close to a minute and a half. Is there something I'm seriously misunderstanding here about this finite state machine?
#include <boost/algorithm/string.hpp>
std::vector<std::string> Foo()
{
std::string fileData;
//Fill filedata with the contents of a file
std::vector<std::string> results;
std::string::iterator begin = fileData.begin();
std::string::iterator end = fileData.end();
std::string::iterator stateZeroFoundLocation = fileData.begin();
std::size_t state = 0;
for(; begin < end; begin++)
{
switch (state)
{
case 0:
if (boost::starts_with(boost::make_iterator_range(begin, end), "pref(")) {
stateZeroFoundLocation = begin;
begin += 4;
state = 2;
} else if (*begin == '/')
state = 1;
break;
case 1:
state = 0;
switch (*begin)
{
case '*':
begin = boost::find_first(boost::make_iterator_range(begin, end), "*/").end();
break;
case '/':
begin = std::find(begin, end, L'\n');
}
break;
case 2:
if (*begin == '"')
state = 3;
break;
case 3:
switch(*begin)
{
case '\\':
state = 4;
break;
case '"':
state = 5;
}
break;
case 4:
state = 3;
break;
case 5:
if (*begin == ',')
state = 6;
break;
case 6:
if (*begin != ' ')
state = 7;
break;
case 7:
switch(*begin)
{
case '"':
state = 8;
break;
default:
state = 10;
break;
}
break;
case 8:
switch(*begin)
{
case '\\':
state = 9;
break;
case '"':
state = 10;
}
break;
case 9:
state = 8;
break;
case 10:
if (*begin == ')')
state = 11;
break;
case 11:
if (*begin == ';')
state = 12;
break;
case 12:
state = 0;
results.push_back(std::string(stateZeroFoundLocation, begin));
};
}
return results;
}
Billy3
EDIT: Well this is one of the strangest things I've ever seen. I just rebuilt the project and it's running reasonably again. Odd.
Unless your 41 kb file is mostly comments or prefs, it's going to spend most of its time in state 0. And for each character in state 0, you make a minimum of two function calls.
if (boost::starts_with(boost::make_iterator_range(begin, end), "pref(")) {
You can speed this up by pre-testing to see if the current character is 'p'
if (*begin == 'p' && boost::starts_with(boost::make_iterator_range(begin, end), "pref(")) {
If the character isn't 'p' then there is no need to make any function calls. In particular not creating a iterator, which is probably where the time is being spent.