Search code examples
javastop-words

String not checked correctly for stop words


I am reading stop words from a file, which I am saving in a HashSet. I compare said HashSet with a String to check for stop words.

If I put a single stop word, such as "the", in the String-variable, my output is "Yes". However, if I put something like "Apple is it" or "it is an apple", the output is "No", despite the fact that both String-variables contain stop words.

Here's the whole program, containing two methods, one for reading the file and one for removing the stop words:

private static HashSet<String> readFile(){
    Scanner x = null;
    HashSet<String> hset = new HashSet<String>();

    try {
        x = new Scanner(new File("StopWordsEnglish"));
        while(x.hasNext()){
            hset.add(x.next());
        }
    } catch(Exception e) {
        e.printStackTrace();
    } finally {
        x.close();
    }
    return hset;
}

public static void removeStopWords(){
    HashSet<String> hset = readFile();
    System.out.println(hset.size());
    System.out.println("Enter a word to search for: ");
    String search = "is";
    String s = search.toLowerCase();
    System.out.println(s);

    if (hset.contains(s)) {
        System.out.println("Yes");
    } else {
        System.out.println("No");
    }
}

Solution

  • I have a feeling I'm not reading your question correctly. But here goes.

    Assuming:

    String search = "it is an apple";
    

    Then you should probably split the string and check each word individually.

    String[] split = search.split(" ");
    for (String s : split) {
    if (hset.contains(s.toLowerCase()) {
        System.out.println("Yes");
        break; //no need to continue if a stop word is found
    } else {
        System.out.println("No");
    }