Search code examples
javascalareflectionapache-commons-collectionpatricia-trie

Why `floorEntry` and other methods are not accessible in PatriciaTrie?


While implementing an ip-lookup structure, I was trying to maintain a set of keys in a trie-like structure that allows me to search the "floor" of a key (that is, the largest key that is less or equal to a given key). I decided to use Apache Collections 4 PatriciaTrie but sadly, I found that the floorEntry and related methods are not public. My current "dirty" solution is forcing them with reflection (in Scala):

val pt = new PatriciaTrie[String]()
val method = pt.getClass.getSuperclass.getDeclaredMethod("floorEntry", classOf[Object])
method.setAccessible(true)
// and then for retrieving the entry for floor(key) 
val entry = method.invoke(pt, key).asInstanceOf[Entry[String, String]]

Is there any clean way to have the same functionality? Why this methods are not publicly available?


Solution

  • Why those methods are not public, I don't know. (Maybe it's because you can achieve what you want with common Map API).

    Here's a way to fulfil your requirement:

    PatriciaTrie<String> trie = new PatriciaTrie<>();
    trie.put("a", "a");
    trie.put("b", "b");
    trie.put("d", "d");
    
    String floorKey = trie.headMap("d").lastKey(); // d
    

    According to the docs, this is very efficient, since it depends on the number of bits of the largest key of the trie.

    EDIT: As per the comment below, the code above has a bounds issue: headMap() returns a view of the map whose keys are strictly lower than the given key. This means that, i.e. for the above example, trie.headMap("b").lastKey() will return "a", instead of "b" (as needed).

    In order to fix this bounds issue, you can use the following trick:

    String cFloorKey = trie.headMap("c" + "\uefff").lastKey(); // b
    
    String dFloorKey = trie.headMap("d" + "\uefff").lastKey(); // d
    

    Now everything works as expected, since \uefff is the highest unicode character. Actually, searching for key + "\uefff", whatever key is, will always return key if it belongs to the trie, or the element immediately prior to key, if key is not present in the trie.

    Now, this trick works for String keys, but is extensible to other types as well. i.e. for Integer keys you could search for key + 1, for Date keys you could add 1 millisecond, etc.