Im writing http parser and have these functions
int parse_useragent(char* buf, int length){
buf[length] = '\0';
if(strstr(buf, "MSIE") != NULL){
return 1;
}else if(strstr(buf, "Firefox") != NULL){
return 2;
}
return DEFAULT_USERAGENT;
}
void parse_headers(unsigned char* buf, http_record_t * http){
char * position = (char*)buf;
char referer[] = "Referer";
char useragent[] = "User-Agent";
...
int length = getlinelength(position); // returns length of line
while(length != 1){ // position points to start of line every iteration of cycle
if(strncmp(position, useragent, sizeof(useragent)-1) == 0){
http->useragent = parse_useragent(position, length);
fprintf(stderr,"parsing useragent \n");
}else if(strncmp(position, referer, sizeof(referer)-1) == 0){
fprintf(stderr,"parsing referer \n");
char * tmp = malloc(REFERER_LENGHT * sizeof(char));
parse_referer(tmp,position, length);
strncpy(http->referer,tmp, REFERER_LENGHT * sizeof(char) - 1);
}else if(...
position += length + 1;
length = getlinelength(position);
}
return;
}
buf
points to start of http headers.
I have function like parse_useragent
for each header and I really need to optimize them.
Length of packet is usually <1000 and length of line rarely exceeds 100 value.
Will optimisation on such short strings have any noticable effect?
I know that some of these algorithms require different approach of parsing then line-by-line. Which way to choose under these specific conditions?
Thanks for help!
If you don't mind hard code the string into the code, I think lex would be the fastest tool to do this kind of task. Because it builds a finite state automaton explicitly in the source code.
Here's the sample lex code to do this task:
%option noyywrap
%{
enum Type{
TFIREFOX = 0, TMSIE = 1
};
enum Type global_variable; /* the variable to store the parsing result */
%}
%%
FIREFOX {global_variable = TFIREFOX; yyterminate();}
MSIE {global_variable = TMSIE; yyterminate();}
. {}
%%
int lex_strstr(char *buf, int n)
{
global_variable = -1;
YY_BUFFER_STATE bs = yy_scan_buffer(buf, n);
yy_switch_to_buffer(bs);
yylex();
return global_variable;
}
Store it in some file like resulte.l
and use flex to compile it to get a c header file:
flex -o head.h result.l
And here's an example to show how this work:
#include "head.h"
int main()
{
{
char buf[] = "this is a test MSIE string\0\0";
printf("%d\n", lex_strstr(buf, (sizeof buf)));
}
{
char buf[] = "this is a test FIREFOX string\0\0";
printf("%d\n", lex_strstr(buf, (sizeof buf)));
}
{
char buf[] = "this is a test MSIEFIREFOX string\0\0";
printf("%d\n", lex_strstr(buf, (sizeof buf)));
}
{
char buf[] = "this is a test MIEFIEFOXdfa\0\0";
printf("%d\n", lex_strstr(buf, (sizeof buf)));
}
}
with result:
1
0
1
-1