Search code examples
phpgraph-visualizationvis.js

visualize genres of movies


Using the example table below, I would like to show:

  • a) nodes: that there are many movies that have "action" or "drama" as a genre
  • b) edges (non-directional): when a movie has "drama" as a genre, that is is likely the movie also has the genre "action"

My main question: how can I best create a list of all the edges consisting of all connected genres?

Let's say I have a table with movies and genres:

GENRE       | MOVIE
--------------------------
Drama       | A
Action      | A
Comedy      | A

Documentary | B
Romantic    | B
Action      | B
Drama       | B

Drama       | C
Romantic    | C
Action      | C
---------------------------

I do not have a preference for visualization framework, but the following comes close to what I had in mind: http://visjs.org/examples/network/09_sizing.html

Other suggestions to visualize are more than welcome!

Based on my movie example, the nodes and edges could look like: http://jsfiddle.net/wivaku/90oef0pg/

example of graph

In this example the edges are hardcoded. In real life I would like to create them dynamically. How can I best create the edges JSON, preferably using PHP?

The PHP snippet I have at the moment:

<?php
//the SQL rows (normally from SQL, now static):
$rows = json_decode('[["Drama","A"],["Action","A"],["Comedy","A"],["Documentary","B"],["Romantic","B"],["Action","B"],["Drama","B"],["Drama","C"],["Romantic","C"],["Action","C"]]');

$nodes = array();
$edges = array();

// create nodes
$genres = array_count_values(array_map(function($i) {return $i[0]; }, $rows));
foreach ($genres as $key => $value) {
    $nodes[] = array("id"=>$key, "value"=>$value);
}

// create edges
// helpful to have genres grouped by movie? (normally from SQL, now static)
$movieGenres = json_decode('[{"movie":"A","genres":["Drama","Action","Comedy"]},{"movie":"B","genres":["Documentary","Romantic","Action","Drama"]},{"movie":"C","genres":["Drama","Romantic","Action"]}]');
// ...

print json_encode(["nodes"=>$nodes, "edges"=>$edges], JSON_NUMERIC_CHECK);
?>

Thanks in advance!

Update: regarding the comments about SQL details / options. The table I have is pretty much as listed. So: genreId and contentId. One option I was exploring (as shortcut for the PHP code): concatenate the genres per movie.

SELECT GROUP_CONCAT(genreId SEPARATOR "|") AS genres
FROM contentGenres
GROUP BY contentId
ORDER BY count(genreId) DESC

With the example data:

Drama|Action|Comedy
Documentary|Romantic|Action|Drama
Drama|Romantic|Action

Or using the genre ID's:

1|2|3
4|5|2|1
1|5|2

The result of my real data set is ±11000 rows, with some movies having 8 genres.


Solution

  • You can do the processing at the SQL level, for instance using this query:

    SELECT a.genreId,b.genreId,count(*)
    FROM genres as a, genres as b
    WHERE a.contentId = b.contentId AND a.genreId < b.genreId
    GROUP BY a.genreId, b.genreId
    

    see an online demo here

    The id's are numbered as the genres in your example:

    1 Drama
    2 Action
    3 Comedy
    4 Documentary
    5 Romantic