Search code examples
javaperformancesearchdictionaryprocessing-efficiency

How to search a word in dictionary efficiently without using linear search java: Reducing Search Space


I have a dictionary with 120000+ words. I want to search through it in a efficient way to check if it contains a certain word.

I want to check the starting character of the given string and then perform a search only from an alphabet below and to an alphabet above it (to reduce the search space).

For example if word is stack. i would like to start 'r' and finish at 't'. In this case the start position and finish position.

So far I have done this:

    inputFile = new Scanner(myFile);

    while (inputFile.hasNext()) {
        fileLine = inputFile.nextLine();

        dictWords.add(fileLine);

        no++;
    }

    HelperClass.setSearchPos(dictWords, "syncope", 0, dictWords.size());

public  static void setSearchPos(ArrayList<String> dictList, String str, int startSearchPoint, int finishSearchPoint){

    ArrayList<String> reducedSearchWords = new ArrayList<String>();

    initSearchPos = startSearchPoint;
    finalSearchPos = finishSearchPoint-1;       
    int midPos = (initSearchPos + finalSearchPos)/2;        
    char startWordChar = dictList.get(initSearchPos).charAt(0);
    char finishWordChar = dictList.get(finalSearchPos).charAt(0);

    startWordChar = shiftChar(startWordChar, 1);
    finishWordChar = shiftChar(finishWordChar, -1);

    while( startWordChar < str.charAt(0) && 
            finishWordChar > str.charAt(0) ){
        if(dictList.get(midPos).charAt(0) > str.charAt(0)){

            setSearchPos(dictList, str, 0 , midPos);
        }

        if(dictList.get(midPos).charAt(0) < str.charAt(0)){

                setSearchPos(dictList, str, midPos , finalSearchPos);                       
        }           
    }       
    System.out.println("Star Pos " + initSearchPos);
    System.out.println("Mid Pos " + midPos);
    System.out.println("Finish Pos " + finalSearchPos);     
}

public static char shiftChar(char c, int key) {

    char shiftedChar;
    shiftedChar = (char) ((char) c + key);

    //This is used to bind the characters between Lowercase a-z
    if (shiftedChar > 122) {
        shiftedChar = (char) ((char) c - 123 + 97 + key);
    }
    return shiftedChar;

}

The output is:

Star Pos 88978
Mid Pos 96382
Finish Pos 103787
Star Pos 88978
Mid Pos 96382
Finish Pos 103786
Star Pos 88978
Mid Pos 96381
Finish Pos 103785

I am happy with the Star Pos and Mid Pos but the loop will continue until Finish Pos is 0 and throw OutofBoundException.

Any Suggestions?


Solution

  • The most conventional thing to do is using Binary Search.

    Another method is to index the dictionary for each starting aplhabet and then go to straight at that index. But this would be helpful only if you are using it for multiple searches as for a single search it would be better to go with binary search.

    Another thing is that you can combine both indexing and binary search if doing multiple searches which make your search even faster.