I'm using Memgraph Lab for a project related to music genres and I imported a dataset structured something like this: The dataset is composed of 2k users. Each user is defined by id and a list of genres he loves. The edges represent the mutual friendship between the users. The genres are listed in the order the users have added them. First I wanted to count all of the genres and managed to do that by running this query:
MATCH (n)
WITH n, "Pop" AS genre
WHERE genre IN n.genres
RETURN genre, count(n);
My issue is that now if we assume that users picked the genres in order of preference, my goal is to create a query or a query module that tells us in what percentage each genre appears in top n place and I'm stuck on creating that
I don't know about the particular query, but you can make it easier for yourself by creating a Query Module and implement all of this in that way. I suppose something like this would work:
import mgp
from collections import defaultdict
@mgp.read_proc
def genre_count(context: mgp.ProcCtx,
genre: str) -> mgp.Record(genre=str, count=int):
count = len(
[v for v in context.graph.vertices if genre in v.properties['genres']])
return mgp.Record(genre=genre, count=count)
@mgp.read_proc
def in_top_n_percentage(context: mgp.ProcCtx,
n: int) -> mgp.Record(genre=str,
percentage=float,
size=int):
genre_count = defaultdict(lambda: {'total_count': 0, 'in_top_n_count': 0})
for v in context.graph.vertices:
for index, genre in enumerate(v.properties['genres']):
genre_count[genre]['total_count'] += 1
genre_count[genre]['in_top_n_count'] += index < n
def get_record(genre, counts): return mgp.Record(
genre=genre,
percentage=counts['in_top_n_count'] / counts['total_count'],
size=counts['total_count']
)
return [get_record(
genre,
counts) for genre,
counts in genre_count.items()]