I'm trying to create a script that looks through a list of strings files and reports on the sub-strings that are most common between them.
For example:
I'd like the script to tell me what are the common elements between the strings, above a certain threshold (for example, 5 characters).
Ideally I'd be told
If functions exist to do this in technologies I'm familiar with - SQL, Javascript, PHP, Ruby or Bash -I'll be extremely happy...
Many thanks,
Jack
This is a hard problem known as the Longest common subsequence problem.
Here is a Python implementation of the algorithm using dynamic programming: http://www.algorithmist.com/index.php/Longest_Common_Subsequence
I don't think that any standard library (C, Java, PHP, Python, Javascript, Ruby, etc.) comes with such a function. But you may look for implementations here: http://www.google.com/codesearch?q=%22longest+common+subsequence%22