Search code examples
javafilebinary-search

Performing a binary search on a text file using Java


I have a large text file about 1 million words. I am doing this for an android phone game, and I am just trying to see if a word exists in the text file. Loading anything into memory isn't an option. The android phone memory and processor is so weak, that reading this file takes about 20 seconds.

I have modified this text file with the words, to be on equal width. Each word is 50 characters + 1 for newline. However, I am getting a little bit confused with how to properly implement a binary search, as I keep getting confused on how many bytes I should add for seek() to work properly.

public static long search(RandomAccessFile file, String target)
            throws IOException {

    file.seek(0);
    String line = file.readLine();

    if(line.equals(target))
        return 1;

    long start = 0;
    long end = file.length();
    long mid = (start + end -50)/2;

    while(start <= end)
    {
        file.seek(mid);
        line = file.readLine();
        if(line.compareTo(target) < 0)
            start = mid + 51;
        else if(line.equalsIgnoreCase(target))
            return 1;
        else
            end = mid - 51;

        mid = (start + end)/2;
    }

    if(start > end)
        return 0;

    return -1;
}

First time I set end I subtract 50 because the very last word has no new line. After a couple of iterations this stops working properly. I can't figure out how to properly make this work. Can anyone guide me on what I am doing wrong?


Solution

  • By wrapping file in an AbstractList, you can leverage out of box binary search implementation:

    final int size = (int) ((file.length() + LINE_BREAK_LEN) / (WORD_LEN + LINE_BREAK_LEN));
    return Collections.binarySearch(
        new AbstractList<String>() {
            public String get(int pIdx) {
                try {
                    file.seek((WORD_LEN + LINE_BREAK_LEN) * pIdx);
                    return file.readLine();
                } catch (IOException ex) {
                    throw new RuntimeException(ex);
                }
            }
    
            public int size() {return size;}
        },
        target,
        Comparator.comparing(String::toLowerCase)
    );
    

    Note that line breaks just complicate the code and could be omitted from file.