Search code examples
c#.netrecursioninfinite-looplevenshtein-distance

Why is this code producing an exponential loop? .Net, Lehvenstein Distance


So recently I embarked on a coding project to try create some code to mathematically create a way to depict how similar two strings are. On my research I found plenty of examples online to help me create the code I desired. I have an error with one I found which in my running of it is creating, what I am calling, exponential loops. It's not an infinite loop, it runs and it solves problems, but the longer the strings I pass into the method the exponentially longer the method runs for. The code is here below

public static int LevenshteinDistance(this string source, string target)
    {
        Console.WriteLine("Start of method");
        if (source.Length == 0) { return target.Length; }
        if (target.Length == 0) { return source.Length; }

        int distance = 0;

        Console.WriteLine("Distance creation");
        if (source[source.Length - 1] == target[target.Length - 1]) { distance = 0; }
        else { distance = 1; }

        Console.WriteLine("Recursive MethodCall");
        return Math.Min(Math.Min(LevenshteinDistance(source.Substring(0, source.Length - 1), target) + 1,
                                 LevenshteinDistance(source, target.Substring(0, target.Length - 1))) + 1,
                                 LevenshteinDistance(source.Substring(0, source.Length - 1), target.Substring(0, target.Length - 1)) + distance);
    }

So, for smaller strings this method runs just fine however, when I start to pass in things like addresses or long names it takes a long time. So long in fact I disbanded this method entirely and wrote another one (i'll provide this if anybody wants it but it's not important for the question) which serves my purpose but in the interest of solving problems and learning, I tried to figure out exactly why this one was taking so long when coded recursively. I stepped through my code with pen and paper, in debug mode and when I get to the recursive call here

return Math.Min(Math.Min(LevenshteinDistance(source.Substring(0, source.Length - 1), target) + 1,
                                 LevenshteinDistance(source, target.Substring(0, target.Length - 1))) + 1,
                                 LevenshteinDistance(source.Substring(0, source.Length - 1), target.Substring(0, target.Length - 1)) + distance);
    }

And I can work out what is happening solidly for these parts

return Math.Min(***Math.Min(LevenshteinDistance(source.Substring(0, source.Length - 1), target) + 1,
                                 LevenshteinDistance(source, target.Substring(0, target.Length - 1))) + 1,***
                                 LevenshteinDistance(source.Substring(0, source.Length - 1), target.Substring(0, target.Length - 1)) + distance);
    }

I tried to bolden and italicize the part I mean, it's the part with '***' around it. Getting to that part I understand but then the next part, the third LevenshteinDistance Call and it's first iteration I start to lose focus on recursion and how it works and where my confusion starts. This part

LevenshteinDistance(source.Substring(0, source.Length - 1), target.Substring(0, target.Length - 1)) + distance);
    }

The code from what I gather once it reaches this call then starts to do the first LevenshteinDistance call all over again and reiterating over the calls it just performed. This is what I'm being confused on. Why does it call the first part of the recursive call again and then the second and what is causing the time to complete the code to lengthen exponentially?

Note: The time may not be exponentially in the literal sense, the times for running a short string comparison were sub-second and once the strings got a little longer it jumper from sub-second -> ~15 seconds -> 2:30 mins -> 35 minutes

2nd Note: Tagged as infinite loop as exponential loop doesn't exist and this somewhat is close to it.


Solution

  • Because it's recursive, not just single recursion (like you would use for a treeview), this puppy passes the return result of itself 3 recursive calls!

    That's why your seeing exponential timing increases with longer strings.