Search code examples
carraysstringsubstringpi

Comparing substring of a character array with another character array in C


I have two characters arrays called arraypi and arraye containing numbers that I read from a file. Each have 1,000,000 characters. I need to start from the first character in arraye (In this case, 7) and search for it in arraypi. If 7 exists in arraypi then I have to search for the next substring of arraye(in this case, 71). Then search for 718, 7182 and so on until the substring does not exist in arraypi. Then I have to simply put the length of the biggest substring in a integer variable and print it.

Worth mentioning that arraypi contains a newline every 50 characters whereas arraye contains a newline every 80 although I don't think that will be problem right?

I tried thinking about a way to accomplish this but so far I haven't thought of something.


Solution

  • I am not absolutely sure if I got this right. I have something like this on my mind:

    • Assume that we have the whole arraypi is in a browser
    • You use the key combination ctrl+f for find
    • Start typing the contents of arraye letter by letter until you see the red no match
    • You want the number of characters you were able to type until then

    If that's right, then an algorithm like the following should do the trick:

    #include <stdio.h>
    #define iswhitespace(X) ((X) == '\n' || (X) == ' ' || (X) == '\t')
    
    int main( ) {
    
        char e[1000] = "somet\n\nhing";
        char pi[1000] = "some other t\nhing\t som\neth\n\ning";
    
        int longestlen = 0;
        int longestx = 0;
        int pix = 0;
        int ex = 0;
        int piwhitespace = 0;       // <-- added
        int ewhitespace = 0;        // <-- these
    
        while ( pix + ex + piwhitespace < 1000 ) {
    
            // added the following 4 lines to make it whitespace insensitive
            while ( iswhitespace(e[ex + ewhitespace]) )
                ewhitespace++;
            while ( iswhitespace(pi[pix + ex + piwhitespace]) )
                piwhitespace++;
    
            if ( e[ex + ewhitespace] != '\0' && pi[pix + ex + piwhitespace] != '\0' && pi[pix + ex + piwhitespace] == e[ex + ewhitespace] ) {
                // the following 4 lines are for obtaining correct longestx value
                if ( ex == 0 ) {
                    pix += piwhitespace;
                    piwhitespace = 0;
                }
                ex++;
            }
            else {
                if ( ex > longestlen ) {
                    longestlen = ex;
                    longestx = pix;
                }
                pix += piwhitespace + 1;
                piwhitespace = 0;
                // the two lines above could be replaced with
                // pix++;
                // and it would work just fine, the injection is unnecessary here
                ex = 0;
                ewhitespace = 0;
            }
        }
    
        printf( "Longest sqn is %d chars long starting at %d", longestlen, longestx + 1 );
    
        putchar( 10 );
        return 0;
    }
    

    What's happening there is, the loop searches for a starting point for match first. Until it finds a match, it increments the index for the array being examined. When it finds a starting point, it then starts incrementing the index for the array containing the search term, keeping the other index constant.

    Until a next mismatch, which is when a record-check is made, search term index is reset and examinee index starts getting incremented once again.

    I hope this helps, somehow, hopefully more than resolving this single-time struggle.

    Edit:

    Changed the code to disregard white space characters.