Search code examples
pythonrubydata-miningdata-analysis

Data Mining: grouping based on two text values (IDs) and one numeric (ratio)


For a music project I want to find what which groups of artists users listens to. I have extracted three columns from the database: the ID of the artist, the ID of the user, and the percentage of all the users stream that is connected to that artist. E.g. Half of the plays from user 15, is of the artist 12.

12 | 15 | 0.5

What I hope to find is a methodology to group clusters of groups together, so e.g. find out that users who tends to listen to artist 12 also listens to 65, 74, and 34.

I wonder what kind of methodologies that can be used for this grouping, and if there are any good sources for this approach (Python or Ruby would be great).


Solution

  • Sounds like a classic matrix factorization task to me.

    With a weighted matrix, instead of a binary one. So some fast algorithms may not be applicable, because they support binary matrixes only.

    Don't ask for source on Stackoverflow: asking for off-site resources (tools, libraries, ...) is off-topic.