Search code examples
databaseconceptualconceptual-modelbigdata

How te create suggestions in database?


I want to develop a small musical library. My objective is to add an idea of suggestions for users :

  • A user adds musics into the application, he is not connected at all, it's anonymous.
  • When a user open or close the application, we send his library to our database, to collect (only) new music tracks information.
  • When a user click on suggestions, i want to check the database and to compare his library with the database. I want to find the music that users like him, who listen the same music as him, listen to.

My idea was to create a link between two musics who defined to percentage of users who got those two musics. If this percentage is high, we can suggest the second one to the users who listen the first one.

I need some help to find documentation about that type of database, without any user idea. I have to compare a user library with a big list of music. I've found that it's item-based recommendation. Am I in a good way ?


Solution

  • Whether a user listens to a particular song or has it in his/her library can be misleading. Lots of times, sample music will come with an operating system or music player and the user just doesn't care enough to remove it, or lots of times it can be hard for a machine to determine the difference between music and other sounds. Or maybe somebody has some music they downloaded because it seemed interesting on paper or came on an album that they liked as a whole, but they actually ended up not liking that song, but again didn't delete it.

    One time I set Windows Media player to shuffle all the music on my computer, and to my surprise, I heard punch sound effects, music I had never heard before (from artists I had never heard of, in genres I didn't listen to), and even Windows click sounds that confused me as I wasn't clicking anything.

    I say all that to point out that you might want to put more thought into it than which users appear to listen to the same music. Maybe you could have users rate the songs they listen to, and compare not only the songs in their libraries but their ratings of the songs. If two users have all the same songs but one user hates all the songs that the other likes and vice-versa, they really don't have similar tastes.

    I would define a UDF that compares two users' tastes by taking each song user 1 has and ignoring it if user 2 doesn't, but subtracting the absolute value of the difference of their ratings from the maximum rating if it does, then adds all these values together.

    Then I would run this UDF for each pair of one user to another and pick the top few, then suggest the songs that they have highly-rated.

    This will take a long time, particularly if you have a large number of users, so what you can also do is make a Suggestors table that stores each user's most similar users, and update (that is, truncate and then rebuild) it via the above process daily, weekly, monthly, whatever fits your situation. The suggestions feature (when used by the user) would then only need to check the user's suggestors' high-rated songs, which would take substantially less time but would keep things fairly up to date with additions and changes to users' libraries.