I'm trying to write a Javascript function to find indices of all occurrences of a word in a text document. Currently this is what I have--
//function that finds all occurrences of string 'needle' in string 'haystack'
function getMatches(haystack, needle) {
if(needle && haystack){
var matches=[], ind=0, l=needle.length;
var t = haystack.toLowerCase();
var n = needle.toLowerCase();
while (true) {
ind = t.indexOf(n, ind);
if (ind == -1) break;
matches.push(ind);
ind += l;
}
return matches;
}
However, this gives me a problem since this matches the occurrences of the word even when it's part of a string. For example, if the needle is "book" and haystack is "Tom wrote a book. The book's name is Facebook for dummies", the result is the index of 'book', 'book's' and 'Facebook', when I want only the index of 'book'. How can I accomplish this? Any help is appreciated.
Here's the regex I propose:
/\bbook\b((?!\W(?=\w))|(?=\s))/gi
To fix your problem. Try it with the exec()
method. The regexp I provided will also consider words like "booklet" that occur in the example sentence you provided:
function getMatches(needle, haystack) {
var myRe = new RegExp("\\b" + needle + "\\b((?!\\W(?=\\w))|(?=\\s))", "gi"),
myArray, myResult = [];
while ((myArray = myRe.exec(haystack)) !== null) {
myResult.push(myArray.index);
}
return myResult;
}
Edit
I've edited the regexp to account for words like "booklet" as well. I've also reformatted my answer to be similar to your function.
You can do some testing here