Search code examples
algorithmmahout

Mahout algorithm advice


What I need is actually just a hint where I can start.

I'm somewhat familiar to Mahout, at least theoretically. I know how it work, how to set it up, etc, and I could build a simple recommendation system based in collaborative filtering.

However, now I'm trying to do something more complex and even after reading quite some about different algorithms, I'm not sure which direction to go.

Quickly what I want to do is:

The final goal is to define one scalar (a "score") of each one of a set of entities based on some "known" entities. The entities interact with each other, known scores influence and define the unknown ones. You can imagine with the following example.

I have a lot if white clothes and a few pieces of colorful ones; red, blue, green... I put them into the washing machine. I want to know what colors the white ones will get after the wash.

Things to take into account:

  • we make a series of washing with different "actors"... some clothes are washed in the 1st and 3rd washing, some of them only in the 2nd, some of them are washed in all
  • in consecutive washes the clothes that were white before but now colored also influence the rest, but not as strong (as they are not as colored)
  • some colors don't "color" as much as others. for example red has a strong effect on most of the clothes, but green not so much
  • coloring effect also depends on how many clothes are in one washing. If you wash a red shirt with a white t-shirt, it gets much more colored, than if there were 100 other white t-shirt
  • clothes don't "lose" their color when influencing others

You can see that while calculating, entities actually have 2 assigned scalars:

  • the color hue (this also defines "coloring power" as mentioned above). The hue can be represented as a number, from 0 to 1, let's say. The coherence between the coloring power and the color number is not linear. It is more like the ends of the scale have more coloring power (0 and 1) while the middle (0.5) has less
  • the color "lightness" (how much an entity is colored, for originally colored clothes it's 1, for white ones it's 0), which in the same time also defines coloring power regardless of the hue

So, again, what I know:

  • which clothes where washed in which consecutive washing
  • I know the original color of some of them, the rest is white in the beginning

What I want to know: - the hue of all clothes in the end of the washing

The problem is that I don't know what (type) of algorithm should I start with. If you were so kind to read so far, please suggest me something (or further reading).

Obviously I don't ask for any detailed thing, again, only hints.

Thank you!


Solution

  • The only thing I can think of that sounds like this problem is PageRank. It's computed by a sort of iterative simluation. Each page has some influence (color) which flows via its links (socks its washed with) and at some point the page influence reaches a steady state (final color). You can look up PageRank algorithms but it is essentially a matter of calculating eigenvectors of a big, erm, sock color matrix.