SQL is obviously not meant for this but I'm wondering if it's possible to do this with SQL as a sort of challenge. Given an arithmetic parse tree such as:
7-2*3
Which could be represented as:
-
/ \
7 *
/ \
2 3
And stored in SQL as the following (assuming +-/*
binary operations on integers only):
CREATE TABLE expression_tree AS (
SELECT * FROM VALUES
(1, 1, '-', NULL, NULL), -- "-" at top
(1, 2, '7', 'L', 1),
(1, 3, '*', 'R', 1),
(1, 4, '2', 'L', 3),
(1, 5, '3', 'R', 3)
AS tmp(expression_id, node_id, value, side, parent_node_id)
)
┌───────────────┬─────────┬───────┬──────┬────────────────┐
│ expression_id ┆ node_id ┆ value ┆ side ┆ parent_node_id │
╞═══════════════╪═════════╪═══════╪══════╪════════════════╡
│ 1 ┆ 1 ┆ - ┆ ┆ │
│ 1 ┆ 2 ┆ 7 ┆ L ┆ 1 │
│ 1 ┆ 3 ┆ * ┆ R ┆ 1 │
│ 1 ┆ 4 ┆ 2 ┆ L ┆ 3 │
│ 1 ┆ 5 ┆ 3 ┆ R ┆ 3 │
└───────────────┴─────────┴───────┴──────┴────────────────┘
Would it be possible to write a WITH RECURSIVE
cte that could return 7-2*3
for the above? To start I would have something like:
WITH RECURSIVE expression(node_id, node_value, expr_value) AS (
SELECT node_id, value, value FROM expression_tree WHERE parent_node_id IS NULL
UNION ALL
SELECT et.node_id, et.value,
CASE WHEN side='L'
THEN CONCAT(value, expr_value)
ELSE CONCAT(expr_value, value)
END
FROM expression_tree et JOIN expression e ON et.parent_node_id=e.node_id
)
SELECT * FROM expression
┌─────────┬────────────┬────────────┐
│ node_id ┆ node_value ┆ expr_value │
╞═════════╪════════════╪════════════╡
│ 1 ┆ - ┆ - │
│ 2 ┆ 7 ┆ 7- │
│ 3 ┆ * ┆ -* │
│ 4 ┆ 2 ┆ 2-* │
│ 5 ┆ 3 ┆ -*3 │
└─────────┴────────────┴────────────┘
Here is the start of it: https://sqlfiddle.com/sql-server/online-compiler?id=fad7e1c0-985a-49c6-9f1b-fbbe9b83c5eb.
My difficulty in the above is basically that I need both node child node values at the same time otherwise, it will keep overwriting the L child when it gets the R child. How could that be fixed?
Recursive aggregation is difficult if not impossible using a recursive CTE. It is often easier and more efficient to use a temp table and loop until you have no rows left to aggregate.
If you want to keep it as a single query, then in SQL Server you can use a recursive scalar function.
This is made more complicated by the fact that the table is not properly normalized. The values and operators should be in separate columns, if not separate tables. Also expression_id
seems unnecessary, as the root node_id
should uniquely identify an expression.
CREATE OR ALTER FUNCTION dbo.CalculateNode (@node_id bigint)
RETURNS bigint
AS BEGIN
RETURN (
SELECT
CASE
WHEN et.value NOT LIKE '%[^0-9]%'
THEN
TRY_CAST(et.value AS bigint)
ELSE
(
SELECT
CASE et.value
WHEN '+' THEN L + R
WHEN '-' THEN ISNULL(L, 0) - R
WHEN '*' THEN L * R
WHEN '/' THEN L / R
WHEN '%' THEN L % R
END
FROM (
SELECT
MIN(CASE WHEN child.side = 'L' THEN v.value END) AS L,
MIN(CASE WHEN child.side = 'R' THEN v.value END) AS R
FROM expression_tree child
CROSS APPLY (SELECT dbo.CalculateNode(child.node_id)) v(value)
WHERE child.parent_node_id = et.node_id
) children
)
END
FROM expression_tree et
WHERE et.node_id = @node_id
);
END;
Finally just select the parent node
SELECT et.expression_id, dbo.CalculateNode(et.node_id) AS final_value
FROM expression_tree et
WHERE et.parent_node_id IS NULL;