Search code examples
memgraphdb

How to dedup.group by the output based on vertex property?


I have a use case where I need to dedup.group by the output based on vertex property, but still output all the properties of the vertex. e.g. Imagine you have a shop which has products and you want to return from each product category one product (fullfilling the where condition, but if more than one match then random). How can you achive this within cypher please? Simple example of the query, which does the dedup, but oviously return only the category

MATCH (s:Shop)-[:SELL]->(p:Product)
WHERE s.org_id = 'shop1'
RETURN DISTINCT p.product_Category
order by p.published_at DESC
limit 10

Solution

  • I would first change the graph layout a bit, so that product_Category isn't a property of the product anymore, but rather a separate node:

    (:Product)-[:HAS_CATEGORY]->(:Category {name: 'some category name'})
    

    Then you can query your graph like this:

    MATCH (s:Shop {org_id: 123})-[:SELL]->(p:Product)-[:HAS_CATEGORY]->(c:Category)
    WITH c, p
    ORDER BY p.published_at DESC
    WITH c, COLLECT(p)[0] AS latest_product
    RETURN c, latest_product
    

    This query would iterate over every product. if you have less than 5000 products this should work regardless, above 5000 you would need indices/constraints in order to increase performance. if you have a lot of products (millions), then it would make sense to calculate the latest product per category in a cron job and mark it as such:

    (:Category)-[:LATEST_PRODUCT]->(:Product)
    

    Then you can query it like this:

    MATCH (s:Shop {org_id: 123})-[:SELL]->(p:Product)<-[:LATEST_PRODUCT]-(c:Category)
    RETURN c, p