I have found an algorithm for Longest Common Substring. It is usually done using dynamic programming
, using a 2-D array of size mxn
where m
and n
are lengths of the two strings under consideration.
I will construct the following matrix for the two strings.
M[i][j] = 1 if s1[i]==s2[j] else 0.
For example, if the strings are: abcxy
and pqaabx
The matrix looks as follows:
a b c x y
p 0 0 0 0 0
q 0 0 0 0 0
a 1 0 0 0 0
a 1 0 0 0 0
b 0 1 0 0 0
x 0 0 0 1 0
Now, I search for a maximal continuous sequence of 1
s in every diagonal which is in top-left to bottom-right direction.
The maximum value among these will be the answer.
I can perform the above operation without using the array explicitly. The time-complexity is still O(M*N)
. So, there is no need of memory.
Can anyone point me where I am going wrong?
Your method is correct. For proof suppose the longest common substring for S1 and S2 was from S1[i..j] and S2[p..q]. this implies S1[i+k] = S2[p+k]
These all lie on the diagonal starting from (i,p).
The dynamic programming solution does the same thing but instead of computing the table first and going through diagonal paths it computes the table depending on it's diagonal parent plus whether or not they match.
EDITED
On your comment on the wikipedia solution using additional memory. It's there only for clarity. In principle you need only two rows of the matrix in the wikipedia solution and keep the current maximum count in one variable. This is correct since for any (i,j)th entry in the matrix
M(i,j) = 1 + M(i-1, j-1) (if s1[i] == s2[j])
as you can see the current row elements depend only on the elements of the immediately upper row.