Search code examples
stringalgorithmdata-structurestreebk-tree

Understanding BK Trees: How do we derive the (d-n, d+n) range from the triangle inequality?


Reading this post about BK Trees, I found the following snippet a bit confusing:

"Assume for a moment we have two parameters, query, the string we are using in our search, and n the maximum distance a string can be from query and still be returned. Say we take an arbitary string, test and compare it to query. Call the resultant distance d. Because we know the triangle inequality holds, all our results must have at most distance d+n and at least distance d-n from test."

I can intuitively see that if something is d away from the word I'm searching with and I have a tolerance of n error, then I will need at least d-n distance from the word I'm at to "reverse" the differences. Similarly I can have at most d+n because after "reversing" the differences, I can introduce n more differences.

I'm confused how the triangle inequality was used to get this. If we let d(test, query) = d and d(query, found) <= n then by the inequality:

d(test, query) + d(test, nextWordToSearch) >= d(query, found)
d + d(test, nextWordToSearch) >= n

How can we find

d - n <= d(test, nextWordToSearch) <= d + n

Solution

  • Using @templatetypedef's answer, I was able to use the Triangle Inequality for finding both the upper and lower bound.

    d(query, desiredNode) = n
    d(query, test) = d1
    
    d(query, test) + d(test, desiredNode) >= d(query, desiredNode)
    d1 + d(test, desiredNode) >= n
    d(test, desiredNode) >= |n - d1|
    
    d(test, query) + d(query, desiredNode) >= d(test, desiredNode)
    |d1 + n| >= d(test, desiredNode)
    

    Hence:

    |d1 + n| >= d(test, desiredNode) >= |d1 - n|
    

    Absolute values used because of property of non-negative measure.