Search code examples
cstringsubstringc-strings

Finding substrings in c(Question 8.16 from C how to program)


I am trying to implement the question 8.16 from the C How to program. The question is below:

(Searching for Substrings) Write a program that inputs a line of text and a search string from the keyboard. Using function strstr, locate the first occurrence of the search string in the line of text, and assign the location to variable searchPtr of type char *. If the search string is found, print the remainder of the line of text beginning with the search string. Then, use strstr again to locate the next occurrence of the search string in the line of text. If a second occurrence is found, print the remainder of the line of text beginning with the second occurrence. [Hint: The second call to strstr should contain searchPtr + 1 as its first argument.].

Here is my code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define SIZE 128

int main(void){
    char s[SIZE], search_string[SIZE];
    char *searchPtr;
    fgets(s, SIZE, stdin);
    fgets(search_string, SIZE, stdin);
    
    searchPtr = strstr(s, search_string);
    //printf("%s", searchPtr);

    if(searchPtr){
        printf("%s%s\n", "The text line beginning with the first occurrence of: ", search_string);
        searchPtr = strstr(searchPtr+1, search_string);
        if(searchPtr){
            printf("%s%s\n","The text line beginning with the second occurrence of: ", search_string);  
        } else{
            printf("%s", "The string to be searched just appeared once.\n"); 
        }
    } else {
        printf("Search string is not found in the string.");
    }
}

Here is my input for string s:

hello world world

Here is my input for search_string:

world

Here is my output

The text line beginning with the first occurrence of: world The string to be searched just appeared once.

But output should have been

The text line beginning with the first occurrence of: world

The text line beginning with the first occurrence of: world


Solution

  • After doing fgets, we have to strip the newline (\n) character.

    Otherwise, search_string will have it at the end and the search will fail [unless the the string occurs at the end of the string to be searched].

    The code is "hardwired" for [at most] two matches within a line/buffer. This can be generalized to count an arbitrary number of matches using a loop.

    This requires that searchPtr be initialized to s and passed to strstr.


    Note that the original code increments searchPtr by 1 to find a subsequent match.

    This is slower than incrementing by the length of search_string.

    But, incrementing by the string length can produce different results if (e.g.) the string to search for is:

    aa
    

    And, the string to search within is (e.g.):

    aaaaaa
    
    1. Incrementing by the string length will produce 3 matches.
    2. Incrementing by one char will produce 5 matches.

    Here's the refactored code.

    I've added code to allow for multiple search strings and lines to search. They are separated by a blank line.

    I've annotated it:

    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    #define SIZE        1000
    
    // dosearch -- perform a single search
    // RETURNS: 1=more to do, 0=EOF
    int
    dosearch(void)
    {
        char s[SIZE];
        char search_string[SIZE];
        char *searchPtr;
        int match;
        int moreflg;
    
        do {
            // get search string -- stop on EOF
            moreflg = (fgets(search_string, SIZE, stdin) != NULL);
            if (! moreflg)
                break;
    
            // strip newline and get string length
            size_t len = strcspn(search_string,"\n");
            search_string[len] = 0;
    
            // show the [small] string we wish to search for
            printf("\n");
            printf("String to search for is: %s\n",search_string);
    
            // skipping by one char is slow -- we could skip by the length
            // of the search string but this would work differently with:
            //   search string: aa
            //   buffer: aaaaaa
            // skipping by string length would produce 3 matches
            // skipping by one char would produce 5 matches
            // so, better to use 1
    #if 1
            len = 1;
    #endif
    
            while (1) {
                // get line to search -- stop on EOF
                moreflg = (fgets(s, SIZE, stdin) != NULL);
                if (! moreflg)
                    break;
    
                // strip newline
                s[strcspn(s,"\n")] = 0;
    
                // blank line means start new search -- caller will loop for us
                if (s[0] == 0)
                    break;
    
                printf("String to search within is: %s\n",s);
    
                match = 0;
    
                // point to start of line buffer
                searchPtr = s;
    
                while (1) {
                    // search for next occurence of string
                    searchPtr = strstr(searchPtr,search_string);
                    if (searchPtr == NULL)
                        break;
    
                    // increase the number of matches
                    ++match;
    
                    printf("Match #%d found at: %s\n",match,searchPtr);
    
                    // skip over the match we just made
                    searchPtr += len;
                }
    
                printf("A match occurred %d times\n",match);
            }
        } while (0);
    
        return moreflg;
    }
    
    int
    main(void)
    {
    
        while (1) {
            if (! dosearch())
                break;
        }
    }
    

    Here is some sample input data:

    world
    hello world world
    world is not enough
    world
    
    quick
    the quick brown fox jumped over lazy dogs quickly
    
    aa
    aaaaaa
    

    Here is the program output:

    
    String to search for is: world
    String to search within is: hello world world
    Match #1 found at: world world
    Match #2 found at: world
    A match occurred 2 times
    String to search within is: world is not enough
    Match #1 found at: world is not enough
    A match occurred 1 times
    String to search within is: world
    Match #1 found at: world
    A match occurred 1 times
    
    String to search for is: quick
    String to search within is: the quick brown fox jumped over lazy dogs quickly
    Match #1 found at: quick brown fox jumped over lazy dogs quickly
    Match #2 found at: quickly
    A match occurred 2 times
    
    String to search for is: aa
    String to search within is: aaaaaa
    Match #1 found at: aaaaaa
    Match #2 found at: aaaaa
    Match #3 found at: aaaa
    Match #4 found at: aaa
    Match #5 found at: aa
    A match occurred 5 times