algorithm dynamic-programming probability np-complete

color-coding algorithm for the longest path

before anything i would like to make clear yes it is for an college assignment, and I'm looking for help on understand the algorithm to be able to implement it. So I've this assignment that cam be found here: https://www.labri.fr/perso/dorbec/AA/projet-uno.pdf

Basically we have a set of "cards" represented by 2 int one representing the color of a card and another for the number. The work to be done is find the longest sequence of cards like a UNO game where the next card is or the same color or the same number. For that a series of algorithms have being implemented during the curse, but the last one that we have to implement is "color-coding" and right now I'm running out of time and still not really clear how it works. I will put the images of the text here to keep the format.

Any help on understand it will be grateful.

Thanks

Solution

I think I might be able to help you. Your problem can be easily transformed to the longest path problem. To find a path of length k was really hard to solve for a long time. Even when k was relatively small, say log (n), people still thought it was not possible in polynomial time.

The basic idea: You color a graph a lot of times and hope that you accidentally color a path of length k. Well, the probability of that is very small, but if you repeat it a lot of times, the probability actually becomes very large.

But first, let's assume there is a colored path in the graph. What does "colored" even mean? Colored means that the path of length k-1, which has k nodes, is colored in k different colors. If you start at a node v, which has the color 1, what would you do? You would look at your neighbors and see if they have a different color than you. If they do, you add them to a set, called P(1). You would continue with one of the neighbors and see if they have a color, which hasn't appeared in the color set yet. You would do this as long as you find new colors that you haven't seen yet OR you reach k-1 colors OR you see that one of the nodes has a color you have already seen. In the last case, you abort the mission and go back to where everything was still good.

Important: We color the nodes. A path of length i, has i+1 nodes and i edges.

More formal: Let P i(v):= {S is element of ([k] choose i+1)| there exists a path, colored with the colors in S, which ends in v} P is not a path. P contains some sets of colors that we call S. In this case, S is not a number. It is a subset of cardinality i+1 of different colors. For example: P 3(v) could look like this = {{green, red, black, yellow}, {pink, orange, grey, yellow},{...}}. Notice that "yellow" twice? If I would have continued with a few more subsets, "yellow" would have appeared in EVERY set. Why? Because it is the color of our end node v! So P i(v) holds at least one set S of all different colors that one path of length i is colored in. This path ends in v!

What does that mean? If we can compute P k-1(v), and the set is NOT empty. There exists a path of length k. Awesome.

But how do we compute P k-1(v)? It's not that hard: If we want to compute P i(v). What do we need? We need P i-1(x). X? Why? x is a neighbor of v! ---> g ----> y ---> o -----> x ------> v Say {green, yellow, orange, color of x} is one subset of P i-1(x). Let's call it R. Remember? P i-1(x) can have many different sets. It could look like that: {{green, red, black, yellow}, {pink, orange, gray, yellow},{...}}!
Ok so what exactly is the relation with R and x and v? R is a set of colors that denote a colored path of length i-1 that leads up to x. If the vertex v, which is a neighbor of x, has a color that hasn't appeared yet in R, then we can add it to R. But now R has gained one color. It's size, |R|, is now i+2. Seems like this must be one of the new subsets of P i(v)! Why v now? Well, we have extended the path by one color so it better be saved in the according set! So what have we seen so far:

You have a set P i(v), which contains subsets S, which itself contain i+1 many colors (don't forget v)
If you have a set P k-1(v), you have a path of length k and you can drink a beer.
P i(v) can be computed by P i-1(x), where x is a neighbor of v! And the good part is that the only thing you have to check is whether or not the color of v appeared in one of the subsets of P i-1(x), which we called R.

How do you compute it from the beginning? You start at P 0(v), which is just the color of v. Then for each neighbor x of v, you compute P 1(v). P 1(v) is just P 1-1(x) if you recall the i-1 thing. P 0(x) is again just the color of x. If the color of x and v are not the same, they just formed the first subset of P 1(v)! P 2(v) is then computed by computing P 1(x), which is again computed by P 0(y), where y is a neighbor of x. This continues as long as we haven't reached P k-1(v).

As for complexity: This runs in O(2^k km). Where m is the number of edges and k the length of the path. So now we are able to use this algorithm in polynomial time to search for paths that are k = log n long. If it is bigger than that, it's not polynomial anymore, unfortunately.

So, now we have an algorithm that finds a "long" path in polynomial time. But,... wait. It can only do so if the path is colored! I don't know where you live, but in my world graphs are not colored by default and especially not paths with different colors.

We have to do it. What is the probability that we color a path of k-length with k different colors?

There are k! many ways to color a graph with k different colors. But there are k^k different ways to color a path with k different colors, where they can occur multiple times. Example: With yellow = y and green = g you have 2!=2 options, where the colors have to be different: (y,g) or (g,y). When the colors don't have to be different, you have k^k = 2^2 = 4 options. (y,y),(g,g) and the ones you already saw. So Pr[Path is colored with diff. colors]=k!/k^k, which is bigger than e^-k, which is the same as 1/e^k. So you agree that the probability is very small, right? What is the expected number of tries to get the first success? This is a geometric distribution with expectation value = 1/p = e^k. So we think that after e^k tries we can hope for the first time we have a colored path. Sometimes it's less, sometimes it's more. The prob. of failure at one try is 1-e^-k, so very big. BUT if we say execute this Te^k times, the prob. of Te^k consecutive failures becomes very small: (1-e^-k)^Te^k <= (e^-e^-k)^Te^k <= e^-T So the prob. that we will succeed after Te^k tries, is bigger than 1-e^-T. And that is very small.

How does the algorithm look now?

Color the graph randomly with k different colors.
Execute the algorithm that checks if there is a colored path If there is one return it. If not just continue.
Repeat steps 1 and 2 for Te^k times. (It's fun, believe me). It's actually not. Let the computer do it.

This type of algorithm is called a Monte Carlo type randomized algorithm. Its runtime is in O(Te^k2^kkm) and the probability of a false "No there is not a path (but actually there is one)" is smaller than e^-T (again, very small.) And again, for k = log n, this randomized algorithm achieves polynomial runtime!