I have a large text file about 1 million words. I am doing this for an android phone game, and I am just trying to see if a word exists in the text file. Loading anything into memory isn't an option. The android phone memory and processor is so weak, that reading this file takes about 20 seconds.
I have modified this text file with the words, to be on equal width. Each word is 50 characters + 1 for newline. However, I am getting a little bit confused with how to properly implement a binary search, as I keep getting confused on how many bytes I should add for seek() to work properly.
public static long search(RandomAccessFile file, String target)
throws IOException {
file.seek(0);
String line = file.readLine();
if(line.equals(target))
return 1;
long start = 0;
long end = file.length();
long mid = (start + end -50)/2;
while(start <= end)
{
file.seek(mid);
line = file.readLine();
if(line.compareTo(target) < 0)
start = mid + 51;
else if(line.equalsIgnoreCase(target))
return 1;
else
end = mid - 51;
mid = (start + end)/2;
}
if(start > end)
return 0;
return -1;
}
First time I set end I subtract 50 because the very last word has no new line. After a couple of iterations this stops working properly. I can't figure out how to properly make this work. Can anyone guide me on what I am doing wrong?
By wrapping file in an AbstractList, you can leverage out of box binary search implementation:
final int size = (int) ((file.length() + LINE_BREAK_LEN) / (WORD_LEN + LINE_BREAK_LEN));
return Collections.binarySearch(
new AbstractList<String>() {
public String get(int pIdx) {
try {
file.seek((WORD_LEN + LINE_BREAK_LEN) * pIdx);
return file.readLine();
} catch (IOException ex) {
throw new RuntimeException(ex);
}
}
public int size() {return size;}
},
target,
Comparator.comparing(String::toLowerCase)
);
Note that line breaks just complicate the code and could be omitted from file.