I wrote a java program that counts the number of characters in a file. To check that the program is working correctly, I type this into the command line (linux) to check the number of characters:
wc -m fileName
from the man page for wc
, I know that the newline character is included in the count.
Here is my java program:
import java.io.IOException;
import java.io.File;
import java.util.Scanner;
public class NumOfChars {
/** The main method. */
public static void main(String[] args) throws IOException {
// Check that command is entered correctly
if (args.length != 1) {
System.out.println("Usage: java NumOfChars fileName");
}
// Check that source file exists
File file = new File(args[0]);
if (!file.exists()) {
System.out.printf("File %s does not exist\n", file);
}
// Create Scanner object
Scanner input = new Scanner(file);
int characters = 0;
while (input.hasNext()) {
String line = input.nextLine();
// The number of characters is the length of the line plus the newline character
characters += line.length() + 1;
}
input.close();
// Print results
System.out.printf("File %s has\n", args[0]);
System.out.printf("%d characters\n", characters);
}
}
The issue I'm having is that sometimes the number of characters reported from using the java program is different from the number I get when using the wc
command.
Here are two examples:
One that works. The contents of the file text.txt
is
This is some text
This is some text
This is some text
This is some text
This is some text
This is some text
This is some text
This is some text
The command wc -m text.txt
tells me that this file has 144 characters. This is good because when I execute the java program java NumOfChars text.txt
, I am also told that the file has 144 characters.
One that doesn't work. The contents of file Exercise06.java
is
import java.util.Scanner;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
/** Converts a hexadecimal to a decimal. */
public class Exercise06 {
/** Main method */
public static void main(String[] args) {
// Create a Scanner
Scanner input = new Scanner(System.in);
// Prompt the user to enter a string
System.out.print("Enter a hex number: ");
String hex = input.nextLine();
// Display result
System.out.println("The decimal value for hex number "
+ hex + " is " + hexToDecimal(hex.toUpperCase()));
}
/** Converts hexadecimal to decimal.
@param hex The hexadecimal
@return The deciaml value of hex
@throws NumberFormatException if hex is not a hexadecimal
*/
public static int hexToDecimal(String hex) throws NumberFormatException {
// Check if hex is a hexadecimal. Throw Exception if not.
boolean patternMatch = Pattern.matches("[0-9A-F]+", hex);
if (!patternMatch)
throw new NumberFormatException();
// Convert hex to a decimal
int decimalValue = 0;
for (int i = 0; i < hex.length(); i++) {
char hexChar = hex.charAt(i);
decimalValue = decimalValue * 16 + hexCharToDecimal(hexChar);
}
// Return the decimal
return decimalValue;
}
/** Converts a hexadecimal Char to a deciaml.
@param ch The hexadecimal Char
@return The decimal value of ch
*/
public static int hexCharToDecimal(char ch) {
if (ch >= 'A' && ch <= 'F')
return 10 + ch - 'A';
else // ch is '0', '1', ..., or '9'
return ch - '0';
}
}
The command wc -m Exercise06.java
tells me that this file has 1650 characters. However, when I execute the java program java NumOfChars Exercise06.java
, I am told that the file has 1596 characters.
I can't seem to figure out what I'm doing wrong. Can anyone provide me with some feedback?
**EDIT: Here is what I get when typing in head -5 Exercise06.java | od -c
There are several possible explanations:
It is possible that each line ends with more than one character, for example on Windows each line ends with CR + LF, whereas your program always counts exactly 1 line ending character.
wc
may assume a different character encoding than your program, possibly leading to different character counts for multi-byte characters.