Search code examples
graph-databasesmemgraphdb

Creating a Query Module in Memgraph


I'm using Memgraph Lab for a project related to music genres and I imported a dataset structured something like this: The dataset is composed of 2k users. Each user is defined by id and a list of genres he loves. The edges represent the mutual friendship between the users. The genres are listed in the order the users have added them. First I wanted to count all of the genres and managed to do that by running this query:

MATCH (n)
WITH n, "Pop" AS genre
WHERE genre IN n.genres
RETURN genre, count(n);

My issue is that now if we assume that users picked the genres in order of preference, my goal is to create a query or a query module that tells us in what percentage each genre appears in top n place and I'm stuck on creating that


Solution

  • I don't know about the particular query, but you can make it easier for yourself by creating a Query Module and implement all of this in that way. I suppose something like this would work:

    import mgp
    from collections import defaultdict
    
    @mgp.read_proc
    def genre_count(context: mgp.ProcCtx,
                    genre: str) -> mgp.Record(genre=str, count=int):
        count = len(
            [v for v in context.graph.vertices if genre in v.properties['genres']])
        return mgp.Record(genre=genre, count=count)
        
    @mgp.read_proc
    def in_top_n_percentage(context: mgp.ProcCtx,
                            n: int) -> mgp.Record(genre=str,
                                                  percentage=float,
                                                  size=int):
        genre_count = defaultdict(lambda: {'total_count': 0, 'in_top_n_count': 0})
    
        for v in context.graph.vertices:
            for index, genre in enumerate(v.properties['genres']):
                genre_count[genre]['total_count'] += 1
                genre_count[genre]['in_top_n_count'] += index < n
    
        def get_record(genre, counts): return mgp.Record(
            genre=genre,
            percentage=counts['in_top_n_count'] / counts['total_count'],
            size=counts['total_count']
        )
    
        return [get_record(
            genre,
            counts) for genre,
            counts in genre_count.items()]