php mysql database graph-databases relevance

Creating a Facebook like search within my website

We all know that at Facebook the graph search exists. Users can search for people who like cycling and are from London, for example, friends of friends who like yoga, or photos of friends or boyfriends from a certain month or year.

All this data is extracted from a single search input with no filter fields.

I am trying to start with something similar with PHP but I couldn't tell exactly how this might be implemented.

I was wondering if this is applied through a certain database design approach (simple RDBMS) only... or is it a sort of graph node structures that get logically linked to database tables with keywords... or a mixture of RDBMS and NOSQL... or any other approach. As for the text input itself, there must be some sort of dissection and matching against specific keywords to get the relevance of data and directing it to the proper query execution.

What is the best practice to achieve a php graph search (or something similar at least) within my website where I have something similar to a retail e-commerce system with grouped relevant data?

Solution

You could solve for each of your examples separately, but it could prove tedious, and you'd likely run into a wall in terms of performance.

People who like cycling and are from London (SQL)

   SELECT users.id 
     FROM users, posts, topics, locations 
    WHERE posts.topic_id = topics.id
      AND users.id = posts.author_id
      AND users.location_id = locations.id
      AND locations.city = 'London' 
      AND topics.name = 'cycling'    
 GROUP BY users.id   
 ORDER BY COUNT(posts.id) DESC

(using a really loose definition of 'liking cycling', and being 'from London')

Relational Databases don't handle lots of joins particularly gracefully. Your performance is going to suffer under load or with a large dataset.

However, in a Graph Database (like Neo4J, or TitanDB), you could traverse a graph of related entities and collect matching entity nodes in a much more generic way, in an environment optimized for serving the type of use cases you're thinking about.

Same query (Cypher - Neo4J)
   MATCH (topic:Topics {name:'cycling'})
           <-[:POST_TOPIC]-(post:Posts)
           -[:AUTHORED_BY]->(user:Users)
   WHERE user-[:RESIDENT_OF]->(location:Location {city:'London'})
  RETURN user.id AS user_id, count(post) AS post_count
ORDER BY post_count DESC
These are also expressible as Gremlin traversals (for Titan and other Graph DBs), but they start getting quite verbose and hard to decipher.

There are generic ways to approach what you describe with facebook-style graph search relevance. In your case, it sounds like you probably want personalized search, e.g. all the related vertices within a few degrees of separation of the searcher (using whatever edge relationships you have: Location, Interests, Friends, etc...).

If you can't easily enumerate all the use cases you want to build today, you'll probably be happier with a graph database, so you can experiment with your ideas, and launch them into production without having to cut corners for performance reasons.