Search code examples
javasubstringdna-sequence

Compare Multiple Substrings


I'm attempting to write a basic dna sequencer. In that, given two sequences of the same length, it will output the strings which are the same, with a minimal length of 3. So input of

abcdef dfeabc

will return

1 abc

I am not sure how to go about solving the problem. I can compare the two strings, and see if they are completely equal. From there, i can compare length-1 string size, i.e. if dfeabc contains abcde. However, how can i get the program to do all possible strings, down to a minimal size of 3 characters? One issue is for the above example of length-1, I'd also have to do the string bcdef and compare that.


Solution

  • The naive way would be to get every single substring of string A and see if it's in string B.

    Here's the naive way of doing that:

    for ( int i = 0; i < a.length; i++ ) {
       for ( int j = i+1; j <= a.length; j++ ) {
           if (b.contains(a.substring(i, j))) {
               //if longer than last found match, record this match
           }
       }
    }
    

    The slightly more optimal way would be to look at longer substrings first, so that the first substring that matches is necessarily the longest.

    for ( int length = a.length; length > 0; length-- ) {
         for ( int i = 0; i + length < a.length; i++ ) {
             if ( b.contains(a.substring(i, i+length)) ) {
                 //this is the biggest match
             }
         }
    }