Search code examples
javaarraysregexstringjava-stream

How to check if any word from a list is in a string as a whole word (i.e. not as part of another word)?


I need to check if any string from a string list matches wholly (whole word search) within the input string; i.e. it should not match the word in between characters.

An incorrect attempt:

String input = "i was hoping the number";
String[] valid = new String[] { "nip", "pin" };
if (Arrays.stream(valid).anyMatch(input::contains)) {
    System.out.println("valid");
}

My output is valid, which is not correct. It is fetching the pin string from the hoping word. I should be able to match only if the pin word is separate.


Solution

  • Do it as follows:

    import java.util.Arrays;
    import java.util.regex.Pattern;
    
    public class Main {
        public static void main(String[] args) {
            String input = "i was hoping the number";
            String[] valid = new String[] { "nip", "pin" };
            if (Arrays.stream(valid).anyMatch(p -> Pattern.compile("\\b" + p + "\\b").matcher(input).find())) {
                System.out.println("valid");
            }
        }
    }
    

    Note that \b is used for word boundary which I have added before and after the matching words to create word boundary for them.

    Some more tests:

    import java.util.Arrays;
    import java.util.regex.Pattern;
    
    public class Main {
        public static void main(String[] args) {
            String[] testStrings = { "i was hoping the number", "my pin is 123", "the word, turnip ends with nip",
                    "turnip is a vegetable" };
            String[] valid = new String[] { "nip", "pin" };
            for (String input : testStrings) {
                if (Arrays.stream(valid).anyMatch(p -> Pattern.compile("\\b" + p + "\\b").matcher(input).find())) {
                    System.out.println(input + " => " + "valid");
                } else {
                    System.out.println(input + " => " + "invalid");
                }
            }
        }
    }
    

    Output:

    i was hoping the number => invalid
    my pin is 123 => valid
    the word, turnip ends with nip => valid
    turnip is a vegetable => invalid
    

    Solution without using Stream API:

    import java.util.regex.Pattern;
    
    public class Main {
        public static void main(String[] args) {
            String input = "i was hoping the number";
            String[] valid = new String[] { "nip", "pin" };
            for (String toBeMatched : valid) {
                if (Pattern.compile("\\b" + toBeMatched + "\\b").matcher(input).find()) {
                    System.out.println("valid");
                }
            }
        }
    }