Search code examples
searchlucenesocial-networkingentity-relationshipelgg

what's the best way to search a social network by prioritizing a users relationships first?


I have a social network set up and via an api I want to search the entries. The database of the social network is mysql. I want the search to return results in the following format: Results that match the query AND are friends of the user performing the search should be prioritized over results that simply match the query.

So can this be done in one query or will I have to do two separate queries and merge the results and remove duplicates?

I could possibly build up a data structure using Lucene and search that index efficiently, but am wondering if the penalty of updating a document everytime a new relationship is created is going to be too much?

Thanks


Solution

  • The reference to Lucene complicates the equation a little bit. Let's solve it (or at least get a baseline) without it first.

    Assuming the following datamodel (or something approaching.

    tblUsers
      UserId  PK
      UserName
      Age
      ...
    
    tblBuddies
      UserId     FK to tblUsers.UserId
      FriendId   tblUsers.Userid  = Id of one of the friends
      BuddyRating     float 0.0 to 1.0 (or whatever normalized scale) indicating 
                      the level of friendship/similarity/whatever
    
    tblItems
      ItemId  PK
      ItemName
      Description
      Price
      ...
    
    tblUsersToItems
       UserId   FK to tblUsers.UserId
       ItemId   FK to 
       ItemRating   float 0.0 to 1.0 (or whatever normalized scale) indicating 
                    the "value" assigned to item by user.
    

    A naive query (but a good basis for an optimized one) could be:

    SELECT [TOP 25]  I.ItemId, ItemName, Description, SUM(ItemRating * BuddyRating)
    FROM tblItems I
    LEFT JOIN tblUserToItems UI ON I.ItemId = UI.ItemId
    LEFT JOIN tblBuddies B ON UI.UserId = B.FriendId
    WHERE B.UserId = 'IdOfCurrentUser'
      AND SomeSearchCriteria -- Say ItemName = 'MP3 Player'
    GROUP BY I.ItemId, ItemName, Description
    ORDER BY SUM(ItemRating * BuddyRating) DESC
    

    The idea is that a given item is given more weight if it is recommended/used by a friend. The extra weigh is the more important if the friend is a a close friend [BuddyRating] and/or if the friend recommend this item more strongly [ItemRating]

    Optimizing such a query depends on the overal number of item, the average/max numbers of buddies a given user has, the average/max number of items a user may have in his/her list.

    Is this type of ideas/info you are seeking or am I missing the question?