Search code examples
cxmlxml-parsing

How do I search for a substring in a char* from a starting index using C?


I have the following loop which calls the parseXML() function:

...

char buffer[MAX_BUFSIZE];
memset(buffer, 0, sizeof(buffer));
int bytes_received = tcp_read(socket, buffer, sizeof(buffer));

...

int bytes_read_total = 0;
while (bytes_read_total < bytes_received) {
        
    int bytes_read = 0;
    Person* person = parseXML(buffer, bytes_received - bytes_read_total, &bytes_read);

    if (person == NULL) break;

    bytes_read_total += bytes_read;
    savePerson(person);

}

and I here is the start of my parseXML() function:

Person* parseXML(char* buffer, int bufSize, int* bytesRead) {
    
    char* startTag = strstr(buffer, "<person>");
    char* endTag = strstr(buffer, "</person>");
    if (startTag == NULL || endTag == NULL) {
        return NULL;
    }

    Person* person = newPerson();
    
    int personLength = endTag + strlen("</person>") - startTag;

    char* personBuffer = (char*)malloc(personLength + 1);
    memcpy(personBuffer, startTag, personLength);
    personBuffer[personLength] = '\0';

    ...

    free(personBuffer);
    *bytesRead = personLength + 1;
    return person;

When using strstr() currently it is always finding the first person in my xml each iteration instead of starting the search from the bytesRead offset. Please let me know how I can fix this.


Solution

  • strstr accepts a char pointer as its first parameter, and stops searching when it reaches a null character (end-of-string).

    Pointers can be used in arithmetic, for example:

        int nums[2] = {1, 2};
        int *secondnum = nums + 1;  // "nums" array decays (turns) into a pointer
        printf("%d\n", *secondnum);  // prints "2"
    

    As such, to search a string starting from an offset using strstr, all you need to do is increment your pointer:

        char *haystack = "foo 1, foo 2";
        char *needle = "foo";
    
        char *first_foo = strstr(haystack, needle);
        char *second_foo = strstr(first_foo + strlen(needle), needle);
        printf("%td\n", second_foo - haystack);  // prints "7", the position of the second "foo"
    

    When strstr doesn't find the substring, it returns a NULL pointer. You can check if the substring was found doing the following:

        char *found = strstr(haystack, needle);
        if (found != NULL) {
            // found is a pointer to the position of the substring
        } else {
            // substring not found
        }