I have a string S
and a list of strings allItems
, allItems
contains strings that may have common "sub-words" but one element is never an extension of another:
//good, they both contain Fuzzy but Fuzzy isn't in allItems
const allItems = "FuzzyBunny", "FuzzyBear"
//not good BCS allItems[i] = whatever1+allItems[j]+whatever2
const allItems = "Fuzzy", "FuzzyBear", "FuzzyBunny" , "FuzzyBearBunny"
My goal is to find every match or approximate match of a string in allItems
in S
alongside their index(can be start or end, or ideally both). I've been searching for some algorithms to do this, similar to the aho-corasick algorithm, but that doesn't do approximate string matching.
Example:
S = "I love FuzzyBears and FuzzyDucks"
allItems = ["FuzzyBear", "FuzzyDuck"]
->
[
{ match: "FuzzyBear", matchIndex: 0, startIndex: 7, endIndex: 16 },
{ match: "FuzzyDuck", matchIndex: 1, startIndex: 22, endIndex: 31 }
]
I'm still pretty new to pattern matching so I'd appreciate some resources on how to code any of the recommended algorithms.
UPDATE: I've found a Fuzzified Aho-Corasick automata as described here, but I have very little idea on how to implement this in JS.I also have no issues with the code being slow as I use this for a one time run, and rarely need to do this often.
import difflib
s = "I love FuzzyBears and FuzzyDucks"
allitems = ["FuzzyBear", "FuzzyDuck"]
lst = []
for x in allitems:
d = {}
match = difflib.get_close_matches(x, s.split())
if match:
d["match"] = x
d["matchIndex"] = allitems.index(x)
d["startIndex"] = s.find(x)
d["endIndex"] = d["startIndex"] + len(x)
lst.append(d)
print(lst)
>>>
[{'match': 'FuzzyBear', 'matchIndex': 0, 'startIndex': 7, 'endIndex': 16}, {'match': 'FuzzyDuck', 'matchIndex': 1, 'startIndex': 22, 'endIndex': 31}]
import difflib
from itertools import combinations
s = "I love FuzzyBears and FuzzyDucks"
allitems = ["FuzzyBear", "FuzzyDuck"]
lst = []
for x in allitems:
d = {}
match = difflib.get_close_matches(x, [x[i:j] for i, j in combinations(range(len(x) + 1), r=2)])
if match:
d["match"] = x
d["matchIndex"] = allitems.index(x)
d["startIndex"] = s.find(x)
d["endIndex"] = d["startIndex"] + len(x)
lst.append(d)
print(lst)
>>>
[{'match': 'FuzzyBear', 'matchIndex': 0, 'startIndex': 7, 'endIndex': 16}, {'match': 'FuzzyDuck', 'matchIndex': 1, 'startIndex': 22, 'endIndex': 31}]
After a bunch of online searching:
const s = "I love FuzzyBears and FuzzyDucks";
const allitems = ["FuzzyBear", "FuzzyDuck"];
const lst = [];
for (const i of allitems) {
const x = {};
if (s.includes(i)) {
x["match"] = i;
x["matchIndex"] = allitems.indexOf(i);
x["startIndex"] = s.indexOf(i);
x["endIndex"] = s.indexOf(i) + i.length;
}
lst.push(x);
}
console.log(lst);
>>>
[ { match: 'FuzzyBear', matchIndex: 0, startIndex: 7, endIndex: 16 },
{ match: 'FuzzyDuck',
matchIndex: 1,
startIndex: 22,
endIndex: 31 } ]