Search code examples
javaregexcount

How to match the text file against multiple regex patterns and count the number of occurences of these patterns?


I want to find and count all the occurrences of the words unit, device, method, module in every line of the text file separately. That's what I've done, but I don't know how to use multiple patterns and how to count the occurrence of every word in the line separately? Now it counts only occurrences of all words together for every line. Thank you in advance!

private void countPaterns() throws IOException {

    Pattern nom = Pattern.compile("unit|device|method|module|material|process|system");

    String str = null;      

    BufferedReader r = new BufferedReader(new FileReader("D:/test/test1.txt")); 

    while ((str = r.readLine()) != null) {
        Matcher matcher = nom.matcher(str);

        int countnomen = 0;
        while (matcher.find()) {
            countnomen++;
        }

        //intList.add(countnomen);
        System.out.println(countnomen + " davon ist das Wort System");
    }
    r.close();
    //return intList;
}

Solution

  • Better to use a word boundary and use a map to keep counts of each matched keyword.

    Pattern nom = Pattern.compile("\\b(unit|device|method|module|material|process|system)\\b");
    
    String str = null;
    BufferedReader r = new BufferedReader(new FileReader("D:/test/test1.txt"));
    Map<String, Integer> counts = new HashMap<>();
    
    while ((str = r.readLine()) != null) {
        Matcher matcher = nom.matcher(str);
    
        while (matcher.find()) {
            String key = matcher.group(1);
            int c = 0;
            if (counts.containsKey(key))
                c = counts.get(key);
            counts.put(key, c+1)
        }
    }
    r.close();
    
    System.out.println(counts);