I'm trying to write a large text file to a binary file, but the binary file is the same size as my text file. I thought that writing to a binary file would compress it? Is writing to a binary file just more efficient? How can I minimize the storage of my text file for use?
ArrayList<String> strArr = new ArrayList<String>();
File f = new File("words.txt");
BufferedInputStream in = new BufferedInputStream(new FileInputStream(f));
DataOutputStream out = new DataOutputStream (
new BufferedOutputStream(
new FileOutputStream("word.ser")
));
byte[] buffer = new byte[8192]; // or more, or even less, anything > 0
int count;
while ((count = in.read(buffer)) > 0) {
out.write(buffer, 0, count);
}
in.close();
out.close();
/*ObjectOutputStream oos = new ObjectOutputStream(
new BufferedOutputStream(
new FileOutputStream("words.ser")
)); */
System.out.println(f.length());
File file = new File("words.ser");
System.out.println(file.length());
You're confused.
There's no such thing as a 'text' file or a 'binary' file, at least, to a harddisk / a filesystem. It's a bag of bytes. They all are. Just.. bytes.
Now, if the bytes so happen to form a sequence that, say, Microsoft Word will correctly read in if you pick that file from its 'file open' menu, we may say 'this is a Word file'. The filesystem cares absolutely nothing whatsoever for such frivolous human things. It was asked to provide the bytes in a file named 'foo.doc' and it did so. It did so in the exact, precise same fashion it would have done had word asked the filesystem to give it the bytes from 'foo.txt' or 'foo.jpg'. It's up to word to crash if the bytes don't make sense to it.
So, what's a 'text file'. Same deal applies: if a text editing tool asks the file system to open a file, and it 'works', I guess we can call it a text file. To the file system, it's.. just a file.
And now you know why sending the file as an OutputStream or as a BufferedWriter or what not makes no difference. That's just modifying the precise mechanism by which the characters end up in byte form. Assuming it's simple ASCII characters, it's 1 byte per character, simple as that.
If you want it to be smaller, you'd have to use compression algorithms, like gzip. Note that, obviously, random data cannot be compressed. The only amount of 'compression' you get is the amount of non-entropy inherent in the data that your compression algorithm can manage to find and code into a more efficient form. The other answer shows one easy way to do this.