I am creating a Huffman tree to compress a text file but I am having some issues. This method I am making is supposed to take a FileInputStream
which inputs the text data and returns a Map
of the characters and the counts. However, to do that, I need to define the size of byte[]
to store the data. The problem is that the byte[]
array size needs to be just the right length or else the Map
will also have some unneeded data. Is there a way to make the byte[]
just the right size?
Here is my code:
// provides a count of characters in an input file and place in map
public static Map<Character, Integer> getCounts(FileInputStream input)
throws IOException {
Map<Character, Integer> output = new TreeMap<Character, Integer>(); // treemap keeps keys in sorted order (chars alphabetized)
byte[] fileContent = new byte[100]; // creates a byte[]
//ArrayList<Byte> test = new ArrayList<Byte>();
input.read(fileContent); // reads the input into fileContent
String test = new String(fileContent); // contains entire file into this string to process
// goes through each character of String to put chars as keys and occurrences as keys
for (int i = 0; i < test.length(); i++) {
char temp = test.charAt(i);
if (output.containsKey(temp)) { // seen this character before; increase count
int count = output.get(temp);
System.out.println("repeat; char is: " + temp + "count is: " + count);
output.put(temp, count + 1);
} else { // Haven't seen this character before; create count of 1
System.out.println("new; char is: " + temp + "count is: 1");
output.put(temp, 1);
}
}
return output;
}
The return value of FileInputStream.read()
is the number of bytes actually read, or -1
in case of EOF. You can use this value instead of test.length()
in the for loop.
Notice that read()
is not guaranteed to read in the buffer length worth of bytes, even if the end of file is not reached, so it is usually used in a loop:
int bytesRead;
//Read until there is no more bytes to read.
while((bytesRead = input.read(buf))!=-1)
{
//You have next bytesRead bytes in a buffer here
}
Finally, if your strings are Unicode, this approach will not work, since read()
can terminate mid-character. Consider using InputStreamReader
to wrap FileInputStream
:
Reader fileReader = new InputStreamReader(input, "UTF-8");
int charsRead;
char buf[] = new char[256];
while ((charsRead = fileReader.read(buf)) > 0) {
//You have charsRead characters in a buffer here
}