Search code examples
cparsingsearchoptimizationstrstr

Most optimized strstr() like function for my code


Im writing http parser and have these functions

int parse_useragent(char* buf, int length){
    buf[length] = '\0';
    if(strstr(buf, "MSIE") != NULL){
        return 1;
    }else if(strstr(buf, "Firefox") != NULL){
        return 2;
    }
    return DEFAULT_USERAGENT;
}

void parse_headers(unsigned char* buf, http_record_t * http){
    char * position = (char*)buf;
    char referer[] = "Referer";
    char useragent[] = "User-Agent";
    ...
    int length = getlinelength(position); // returns length of line
    while(length != 1){ // position points to start of line every iteration of cycle
        if(strncmp(position, useragent, sizeof(useragent)-1) == 0){
            http->useragent = parse_useragent(position, length);
            fprintf(stderr,"parsing useragent \n");
        }else if(strncmp(position, referer, sizeof(referer)-1) == 0){
            fprintf(stderr,"parsing referer \n");
            char * tmp = malloc(REFERER_LENGHT * sizeof(char));
            parse_referer(tmp,position, length);
            strncpy(http->referer,tmp, REFERER_LENGHT * sizeof(char) - 1);
        }else if(...

        position += length + 1;
        length = getlinelength(position);
    }
    return;
}

buf points to start of http headers.

I have function like parse_useragent for each header and I really need to optimize them. Length of packet is usually <1000 and length of line rarely exceeds 100 value. Will optimisation on such short strings have any noticable effect?

I know that some of these algorithms require different approach of parsing then line-by-line. Which way to choose under these specific conditions?

Thanks for help!


Solution

  • If you don't mind hard code the string into the code, I think lex would be the fastest tool to do this kind of task. Because it builds a finite state automaton explicitly in the source code.

    Here's the sample lex code to do this task:

    %option noyywrap
    %{
    enum Type{
        TFIREFOX = 0, TMSIE = 1
    };
    enum Type global_variable; /* the variable to store the parsing result */
    %}
    
    %%
    FIREFOX {global_variable = TFIREFOX; yyterminate();}
    MSIE {global_variable = TMSIE; yyterminate();}
    . {}
    %%
    
    int lex_strstr(char *buf, int n)
    {
        global_variable = -1;
        YY_BUFFER_STATE bs = yy_scan_buffer(buf, n);
        yy_switch_to_buffer(bs);
        yylex();
        return global_variable;
    }
    

    Store it in some file like resulte.l and use flex to compile it to get a c header file:

    flex -o head.h result.l
    

    And here's an example to show how this work:

    #include "head.h"
    int main()
    {
        {
            char buf[] = "this is a test MSIE string\0\0";
            printf("%d\n", lex_strstr(buf, (sizeof buf)));
        }
        {
            char buf[] = "this is a test FIREFOX string\0\0";
            printf("%d\n", lex_strstr(buf, (sizeof buf)));
        }
        {
            char buf[] = "this is a test MSIEFIREFOX string\0\0";
            printf("%d\n", lex_strstr(buf, (sizeof buf)));
        }
        {
            char buf[] = "this is a test MIEFIEFOXdfa\0\0";
            printf("%d\n", lex_strstr(buf, (sizeof buf)));
        }
    }
    

    with result:

    1
    0
    1
    -1