Search code examples
sqlrecursioncommon-table-expression

Storing an arithmetic parse tree in SQL


SQL is obviously not meant for this but I'm wondering if it's possible to do this with SQL as a sort of challenge. Given an arithmetic parse tree such as:

7-2*3

Which could be represented as:

    -
   / \
  7   *
     / \
    2   3

And stored in SQL as the following (assuming +-/* binary operations on integers only):

CREATE TABLE expression_tree AS (
    SELECT * FROM VALUES
    (1, 1, '-', NULL, NULL), -- "-" at top
    (1, 2, '7', 'L', 1), 
    (1, 3, '*', 'R', 1), 
    (1, 4, '2', 'L', 3), 
    (1, 5, '3', 'R', 3)
    AS tmp(expression_id, node_id, value, side, parent_node_id)
)

┌───────────────┬─────────┬───────┬──────┬────────────────┐
│ expression_id ┆ node_id ┆ value ┆ side ┆ parent_node_id │
╞═══════════════╪═════════╪═══════╪══════╪════════════════╡
│             1 ┆       1 ┆ -     ┆      ┆                │
│             1 ┆       2 ┆ 7     ┆ L    ┆              1 │
│             1 ┆       3 ┆ *     ┆ R    ┆              1 │
│             1 ┆       4 ┆ 2     ┆ L    ┆              3 │
│             1 ┆       5 ┆ 3     ┆ R    ┆              3 │
└───────────────┴─────────┴───────┴──────┴────────────────┘

Would it be possible to write a WITH RECURSIVE cte that could return 7-2*3 for the above? To start I would have something like:

WITH RECURSIVE expression(node_id, node_value, expr_value) AS (
    SELECT node_id, value, value FROM expression_tree WHERE parent_node_id IS NULL
    UNION ALL
    SELECT et.node_id, et.value, 
        CASE WHEN side='L' 
          THEN CONCAT(value, expr_value) 
          ELSE CONCAT(expr_value, value) 
        END
        FROM expression_tree et JOIN expression e ON  et.parent_node_id=e.node_id
) 
SELECT * FROM expression

┌─────────┬────────────┬────────────┐
│ node_id ┆ node_value ┆ expr_value │
╞═════════╪════════════╪════════════╡
│       1 ┆ -          ┆ -          │
│       2 ┆ 7          ┆ 7-         │
│       3 ┆ *          ┆ -*         │
│       4 ┆ 2          ┆ 2-*        │
│       5 ┆ 3          ┆ -*3        │
└─────────┴────────────┴────────────┘

Here is the start of it: https://sqlfiddle.com/sql-server/online-compiler?id=fad7e1c0-985a-49c6-9f1b-fbbe9b83c5eb.

My difficulty in the above is basically that I need both node child node values at the same time otherwise, it will keep overwriting the L child when it gets the R child. How could that be fixed?


Solution

  • Recursive aggregation is difficult if not impossible using a recursive CTE. It is often easier and more efficient to use a temp table and loop until you have no rows left to aggregate.

    If you want to keep it as a single query, then in SQL Server you can use a recursive scalar function.

    This is made more complicated by the fact that the table is not properly normalized. The values and operators should be in separate columns, if not separate tables. Also expression_id seems unnecessary, as the root node_id should uniquely identify an expression.

    • If the value of the node is convertible to a number then use that.
    • Else take the child nodes and pivot them up into left and right...
    • ... taking in each case the recursive value of the node.
    • ... And use the correct operator to combine them.
    CREATE OR ALTER FUNCTION dbo.CalculateNode (@node_id bigint)
    RETURNS bigint
    AS BEGIN
      RETURN (
        SELECT
          CASE
            WHEN et.value NOT LIKE '%[^0-9]%'
            THEN
              TRY_CAST(et.value AS bigint)
            ELSE
            (
              SELECT
                CASE et.value
                  WHEN '+' THEN L + R
                  WHEN '-' THEN ISNULL(L, 0) - R
                  WHEN '*' THEN L * R
                  WHEN '/' THEN L / R
                  WHEN '%' THEN L % R
                END
              FROM (
                  SELECT
                    MIN(CASE WHEN child.side = 'L' THEN v.value END) AS L,
                    MIN(CASE WHEN child.side = 'R' THEN v.value END) AS R
                  FROM expression_tree child
                  CROSS APPLY (SELECT dbo.CalculateNode(child.node_id)) v(value)
                  WHERE child.parent_node_id = et.node_id
              ) children
            )
          END
        FROM expression_tree et
        WHERE et.node_id = @node_id
      );
    END;
    

    Finally just select the parent node

    SELECT et.expression_id, dbo.CalculateNode(et.node_id) AS final_value
    FROM expression_tree et
    WHERE et.parent_node_id IS NULL;
    

    db<>fiddle