Search code examples
postgresqlapache-age

How to add "expensive_meter" property to vertices in a graph database based on their relation to item prices?


I have a graph database that stores data for restaurants and their menu items. The database has vertices labeled "Restaurant" and "Item", and an "Item" vertex is connected to a "Restaurant" vertex by an "OFFERS" edge.

Each item in the database has properties such as price, name, and description. Some restaurants have an "expensive_meter" property associated with them, while others don't.

-- Restaurants (vertex)
SELECT * FROM cypher('DeliveryGraph', $$
    MATCH (v:Restaurant)
    RETURN v.name, v.expensive_meter, v.type, v.address
$$) AS (name agtype, expensive_meter agtype, type agtype, address agtype);

             name             | expensive_meter |         type          |                       address                       
------------------------------+-----------------+-----------------------+-----------------------------------------------------
 "Pagliacci Pizza"            |                 | "Pizzeria"            | "10600 Main St, Bellevue, WA 98004, U.S"
 "MOD Pizza"                  | 1               | "Pizzeria"            | "317 Bellevue Way NE, Bellevue, WA 98004, U.S"
 "Ascend Prime Steak & Sushi" | 4               | "Steakhouse/Japanese" | "10400 NE 4th St Ste 3100, Bellevue, WA 98004, U.S"
(3 rows)


-- Items (vertex)
SELECT * FROM cypher('DeliveryGraph', $$
    MATCH (v:Item)
    RETURN v.name, v.type, v.price
$$) AS (name agtype, type agtype, price agtype);

                name                 |   type    | price 
-------------------------------------+-----------+-------
 "Diavola"                           | "Pizza"   | 25.99
 "Chicken Primo"                     | "Pizza"   | 30.49
 "The Rocket"                        | "Pizza"   | 25.99
 "Mad Dog"                           | "Pizza"   | 10.27
 "Tristan"                           | "Pizza"   | 10.27
 "Jasper"                            | "Pizza"   | 10.27
 "PAT LAFRIEDA PRIME AMERICAN STEAK" | "Beef"    | 87.0
 "PREMIUM OMAKASE"                   | "Sushi"   | 80.0
 "HIVE"                              | "Dessert" | 68.0
(9 rows)


-- OFFERS (edge)
SELECT * FROM cypher('DeliveryGraph', $$
    MATCH (i)<-[:OFFERS]-(r)
    RETURN r.name, i.name
$$) AS (restaurant agtype, item agtype);

          restaurant          |                item                 
------------------------------+-------------------------------------
 "Pagliacci Pizza"            | "Diavola"
 "Pagliacci Pizza"            | "Chicken Primo"
 "Pagliacci Pizza"            | "The Rocket"
 "MOD Pizza"                  | "Mad Dog"
 "MOD Pizza"                  | "Tristan"
 "MOD Pizza"                  | "Jasper"
 "Ascend Prime Steak & Sushi" | "PAT LAFRIEDA PRIME AMERICAN STEAK"
 "Ascend Prime Steak & Sushi" | "PREMIUM OMAKASE"
 "Ascend Prime Steak & Sushi" | "HIVE"
(9 rows)

Now, I want to add the "expensive_meter" property to the "Restaurant" vertices that currently do not have it, based on the relation of their item prices to the "expensive_meter" values of other restaurants.

For example, if a restaurant has mostly high-priced items, it should be assigned a high "expensive_meter" value. Conversely, if a restaurant offers mostly affordable items, it should have a lower "expensive_meter" value.

I'm looking for guidance on how to implement this logic in my graph. What steps or algorithms can I use to calculate and assign the "expensive_meter" property to the "Restaurant" vertices based on the relation to item prices?

Any suggestions, code snippets, or references to relevant documentation would be greatly appreciated. Thank you in advance for your help!


Solution

  • To determine the "expensive_meter" for pizzas, you can divide the prices into quartiles and calculate the average price within each quartile. Although the ntile function in Postgres could be useful for this task, it is currently not possible to use it with properties in Apache AGE. However, you can achieve the desired result by utilizing the percentileCont function in Apache AGE.

    To begin, let's consider an example where we want to create 5 price groups. We can determine the dividing values for each group using the following query:

    SELECT *
    FROM cypher('DeliveryGraph', $$
        MATCH (n:Item)
        RETURN percentileDisc(n.price, 0.2), percentileDisc(n.price, 0.4), percentileDisc(n.price, 0.6), percentileDisc(n.price, 0.8), percentileDisc(n.price, 1)
    $$) as (group_1 agtype, group_2 agtype, group_3 agtype, group_4 agtype, group_5 agtype);
    

    The query provides the following result:

     group_1 | group_2 | group_3 | group_4 | group_5
    ---------+---------+---------+---------+---------
     10.27   | 25.99   | 30.49   | 80.0    | 87.0
    

    Based on this result, we can determine the price ranges for each group. For instance, prices ranging from 0 to 10.27 would fall into the first group, prices greater than 10.27 up to 25.99 would belong to the second group, and so on.

    Now, to assign the "expensive_meter" value for a specific restaurant, you can calculate the average price for that restaurant and determine the corresponding price group. The "expensive_meter" can then be set as the group number.

    SELECT *
    FROM cypher('DeliveryGraph', $$
        MATCH (r:Restaurant)-[:OFFERS]->(i)
        RETURN r.name, avg(i.price)
    $$) as (average_price agtype);