Search code examples
neo4jcypher

Cypher & Neo4J Beers database


I am trying to solve a question with Cypher in Neo4J where I need to find the BEER_STYLE that has the best combination of smell and looks. The relations are as follows:

(:REVIEW)<-[:REVIEWED]-(:BEER)-[:HAS_STYLE]->(:BEER_STYLE)

One beer sometimes has more than one review and the Reviews nodes have the two properties of interest: look and smell. Sometimes these 2 can be NaN and I have to find the style that has the best combination of those 2 (which I thought it could be the average of averages).

I tried to code a general idea and asked ChatGPT to polish until it works, but I'm not very confident and wanted to ask here. This is the code I got, can someone help/correct me to improve it?

MATCH (s:STYLE)<-[h:HAS_STYLE]-(b:BEERS)-[rev:REVIEWED]->(r:REVIEWS)
WITH s, 
     CASE WHEN r.look <> 'NaN' THEN toFloat(r.look) ELSE null END AS look,
     CASE WHEN r.smell <> 'NaN' THEN toFloat(r.smell) ELSE null END AS smell
WITH s, AVG(look) AS avg_look, AVG(smell) AS avg_smell
WITH s, avg_look, avg_smell, ((avg_look + avg_smell) / 2) AS total
RETURN s AS STYLE, MAX(total) AS Final_Class
ORDER BY Final_Class DESC
LIMIT 2

Solution

  • Preprocess to convert strings to floats

    Assuming that the look and smell values do not need to be stored as strings, you should first change them to floats to avoid having to do conversions every time you use those properties:

    MATCH (r:REVIEWS)
    SET r.look = CASE WHEN r.look <> 'NaN' THEN toFloat(r.look) END
    SET r.smell = CASE WHEN r.smell <> 'NaN' THEN toFloat(r.smell) END
    

    Notes:

    • NULL is the default value for CASE, so it does not need to be specified in an ELSE.
    • When you assign NULL to a property, the property is removed.
    • You should also change the code that imports your data to follow suit.

    Simplified query

    After the above one-time change, your use case can be simplified to:

    MATCH (s:STYLE)<-[:HAS_STYLE]-(:BEERS)-[:REVIEWED]->(r:REVIEWS)
    RETURN s AS STYLE, ((AVG(r.look) + AVG(r.smell)) / 2) AS Final_Class
    ORDER BY Final_Class DESC
    LIMIT 2
    

    Notes:

    • Your original query did not need to call the aggregating function MAX on the total value for each s, since there would only be a single total per s. This is because the aggregating function AVG in the following clause would create only a single avg_look and avg_smell per s:

      WITH s, AVG(look) AS avg_look, AVG(smell) AS avg_smell