Search code examples
javajava.util.scannerstringtokenizer

Java - Counting words, lines, and characters from a file


I'm trying to read in words from a file. I need to count the words, lines, and characters in the text file. The word count should only include words (containing only alphabetic letters, no punctuation, spaces, or non-alphabetic characters). The character count should only include the characters inside those words.

This is what I have so far. I'm unsure of how to count the characters. Every time I run the program, it jumps to the catch mechanism as soon as I enter the file name (and it should have no issues with the file path, as I've tried using it before). I tried to create the program without the try/catch to see what the error was, but it wouldn't work without it.

Why is it jumping to the catch function when I enter the file name? How can I fix this program to properly count words, lines, and characters in the text file?


Solution

  • I don't get any exception with your code if I give a proper file name. As for reading the number of character, you should modify the logic a little bit. Instead of directly concatenating the number of words count, you should create a new instance of StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+"); and iterate through all the token and sum the length of each token. This should give you the number of characters. Something like below

    while (fileScan.hasNextLine()) {
                lineC++;
                tempo = fileScan.nextLine();
                StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+");
                wordC += st.countTokens();
                while(st.hasMoreTokens()) {
                    String stt = st.nextToken();
                    System.out.println(stt); // Displaying string to confirm that like is splitted as I expect it to be
                    charC += stt.length();
                }
                System.out.println("Lines: " + lineC + "\nWords: " + wordC+" \nChars: "+charC);
            }
    

    Note: Escaping character with StringTokenizer will not work. i.e. you would expect that \\s should delimit with any whitespace character but it will instead delimit based on literal character s. If you want to escape a character, I suggest you to use java.util.Pattern and java.util.Matcher and use it matcher.find() to idenfity words and characters