Search code examples
javascriptnode.jsregexwordnet

Use Regex string to find 2 or more word by group or other means


Finding for regex string that can optimize a search result from the question bank. Example: Question: How do you start a conversation with a friend?

Use -> Regex String

Question Bank-

  1. Start a conversation with a friend
  2. Build a conversation with my friend
  3. Start a dialog with my friend
  4. Start a game
  5. Conversation is started by saying Hello, person name

Answer - 1, 2, 3. - are the answers 4,5 - are not related

For now, I'm using WordNet to get the nouns and verbs and then querying to find the result. Is it possible to get search for the question which has at least 2 words found? Current Regex: /(?=.*\bstart\b)(?=.*\bconversation\b)(?=.*\bfriend\b).*/gi Returns only when all words are found.


Solution

  • It would be way easier to do it using simple indexOf. I think it would be faster as well.

    const words = [...];
    let matchedWord = 0;
    for(let word of words){
      if(questionSentence.indexOf(word) > -1) matchedWord+=1
      if(matchedWord > 1) {
        return questionSentence;
      }
    }
    

    The complexity arises because you need to know if a word is matched or not in the past. I would really not use regex for that. You can even wrap it a function like this:

    function matchWords(sentence, words, threshold){
        const lowercaseSentence = sentence.toLowerCase();
        let matchedWord = 0;
        for(let word of words){
          if(lowercaseSentence.indexOf(word.toLowerCase()) > -1) {
              matchedWord+=1
          }
          if(matchedWord >= threshold) {
            return true;
          }
        }
        return false;
    }