Search code examples
javadata-structuresternary-treeternary-search-tree

Case Insensitive Ternary Search Tree


I had been using Ternary Search Tree for a while, as the data structure to implement a auto complete drop down combo box. Which means, when user type "fo", the drop down combo box will display

foo food football

The problem is, my current used of Ternary Search Tree is case sensitive. My implementation is as follow. It had been used by real world for around 1++ yeas. Hence, I consider it as quite reliable.

My Ternary Search Tree code

However, I am looking for a case insensitive Ternary Search Tree, which means, when I type "fo", the drop down combo box will show me

foO Food fooTBall

Here are some key interface for TST, where I hope the new case insentive TST may have similar interface too.

    /** 
 * Stores value in the TernarySearchTree. The value may be retrieved using key.
 * @param key A string that indexes the object to be stored.
 * @param value The object to be stored in the tree.
 */    
public void put(String key, E value) {
    getOrCreateNode(key).data = value;
}

/**
 * Retrieve the object indexed by key.
 * @param key A String index.
 * @return Object The object retrieved from the TernarySearchTree.
 */
public E get(String key) {
    TSTNode<E> node = getNode(key);
    if(node==null) return null;
    return node.data;
}

An example of usage is as follow. TSTSearchEngine is using TernarySearchTree as the core backbone.

Example usage of Ternary Search Tree

// There is stock named microsoft and MICROChip inside stocks ArrayList.
TSTSearchEngine<Stock> engine = TSTSearchEngine<Stock>(stocks);
// I wish it would return microsoft and MICROCHIP. Currently, it just return microsoft.
List<Stock> results = engine.searchAll("micro"); 

Solution

  • One of the key factor which make my current Ternary Search Tree difficult to support case insensitive search is that, my underlying data structure is one-to-one mapping. Please look at the following test code :

    public void testPut() {
        System.out.println("put");
        Name name0 = new Name("abc");
        Name name1 = new Name("abc");
        TernarySearchTree<Name> instance = new TernarySearchTree<Name>();
        instance.put(name0.toString(), name0);
        instance.put(name1.toString(), name1);
        assertEquals(2, instance.matchPrefix("a").size()); // fail here. Result is 1
    }
    

    What my current short-term solution is that, I am using TSTSearchEngine to wrap up the whole TernarySearchTree. TSTSearchEngine is comprised of

    (1) A TernarySearchTree, providing UPPER-CASE key to map.

    (2) A String-To-ArrayList map.

    Here is what happen when I perform :

    TSTSearchEngine<Name> engine = TSTSearchEngine<Name>();
    engine.put(name0); // name0 is new Name("Abc");
    engine.put(name1); // name0 is new Name("aBc");
    

    (1) name0.toString() will be converted to UPPER-CASE ("ABC"). It will be inserted to TernarySearchTree. "ABC" will be both key and value for TernarySearchTree.

    (2) "ABC" will use as the key for map, to insert name0 into an array list.

    (3) name1.toString() will be converted to UPPER-CASE ("ABC"). It will be inserted to TernarySearchTree. S1 will be both key and value for TernarySearchTree.

    (4) "ABC" will use as the key for map, to insert name1 into an array list.

    When I try to

    engine.searchAll("a");
    

    (1) TernarySearchTree will return "ABC".

    (2) "ABC" will be used as the key to access map. Map will return an array list, which is containing name0 and name1.

    This solution works. The sample code can be referred to Sample Code for New TSTSearchEngine

    However, this may not be an effective solution, as it requires two pass of search. I find out there is an implementation in C++ C++ Implementation of Case Insensitive Ternary Search Tree. Hence, there is an opportunity that C++ code can be ported over to Java.