I have a HashMap of terms which contains data of what page the word appeared, the word's frequency in the page, and their positions.
Ex: Word - [page number, word frequency in page, positions in page ]
cat [1, 3, 1, 2, 5 ], [2, 2, 2, 5 ]
dog [2, 2, 1, 7 ]
How would I store this info in a binary file that is easy to read back?
I made the following attempt:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream out = new DataOutputStream(baos);
for(String word: invertedIndex.keySet()) {
out.writeUTF(word); // Write the word
for(Entry entry: invertedIndex.get(word)) { // Info for a page
out.writeInt(entry.pageNum); // Write its page number
out.writeInt(entry.wordFrequency); // Write its freq in that page
for(int position: entry.positions) {
out.writeInt(position); // Write the positions
}
}
}
byte[] bytes = baos.toByteArray();
FileOutputStream fos = new FileOutputStream(PATH);
fos.write(bytes);
fos.close();
Not sure if this is correct... Thanks in advance.
Edit: Thanks, turns out my problem is more on how to decode this rather strictly encode.
Is there a way to preserve this data structure?
Yea. Lots of ways.
Hint: Your attempted solution is a good start.
However a complete solution requires a corresponding method to read the data back. And when you attempt to write a read method that corresponds to your write code, you will discover that there is a systemic problem. For example, there is no easy way to figure out where one list of int
values ends and the next one begins.
There are ways to solve that. Think about it. How can you write two lists one after another so that you know where one ends and the next begins?
Note: your use of ByteArrayOutputStream
is unnecessary. You can write directly to a FileOutputStream
wrapped in a BufferedOutputStream
.