I am having trouble writing a query which would return the parental id. I have a table that looks like this:
CREATE TABLE csx
(
id INT UNSIGNED PRIMARY KEY,
cd VARCHAR(11),
category VARCHAR(20),
lvl INT,
parent_id INT
)
INSERT INTO csx
VALUES
(1,"ab-00-00-00",'ab',1,null),
(2,"ac-00-00-00",'ac',1,null),
(3,"ac-01-00-00",'ac',2,2),
(4,"ac-01-00-01",'ac',3,3),
(5,"ac-01-00-02",'ac',3,3),
(6,"ac-03-00-00",'ac',2,2),
(7,"ac-03-00-01",'ac',3,6),
(8,"ac-03-00-02",'ac'3,6),
(9,"ac-02-00-00",'ac'2,2),
(10,"ac-02-00-01",'ac',3,9)
I want to check whether parent_id (referencing the id of the entry) is correct.
I am new to recursive CTEs (I think those have to be used). Could you please shed some light on the correct way to implement the CTE which would return parental ids?
Instead of trying to verify values after the fact, ensure there can never be invalid ParentID
s with a foreign key :
CREATE TABLE csx
(
id INT PRIMARY KEY,
cd VARCHAR(11),
category VARCHAR(20),
lvl INT,
parent_id INT references csx(id)
)
If you wanted to ensure there are no invalid ParentIDs in an existing table, a simple LEFT JOIN would be enough to find all problems:
SELECT t1.*
from csx t1 left join csx t2 on t1.ParentID=t2.ID
where t2.ID is null
This will return all rows with a non-existent ParentID. The FOREIGN KEY on the other hand ensures there won't be any invalid values in the first place.
To calculate levels and paths you can use a recursive CTE. A CTE is more-or-less a subquery that can be referenced by name and used in multiple places. A recursive CTE is a CTE that refers to itsel.
To get all root items and their children, the following CTE first selects all roots, then joins the actual table with itself to retrieve the children :
with cte as (
select csx.* ,
1 as Level,
cast(ID as varchar(200)) as Path
from csx
where parent_id is null
union all
select csx.* ,
cte.Level+1 as Level,
cast(CONCAT_WS('/',cte.Path, csx.ID) as varchar(200)) As Path
from csx inner join cte on cte.ID=csx.parent_id
)
select * from cte
order by path
id | cd | category | lvl | parent_id | Level | Path |
---|---|---|---|---|---|---|
1 | ab-00-00-00 | ab | 1 | NULL | 1 | 1 |
2 | ac-00-00-00 | ac | 1 | NULL | 1 | 2 |
3 | ac-01-00-00 | ac | 2 | 2 | 2 | 2/3 |
4 | ac-01-00-01 | ac | 3 | 3 | 3 | 2/3/4 |
5 | ac-01-00-02 | ac | 3 | 3 | 3 | 2/3/5 |
6 | ac-03-00-00 | ac | 2 | 2 | 2 | 2/6 |
7 | ac-03-00-01 | ac | 3 | 6 | 3 | 2/6/7 |
8 | ac-03-00-02 | ac | 3 | 6 | 3 | 2/6/8 |
9 | ac-02-00-00 | ac | 2 | 2 | 2 | 2/9 |
10 | ac-02-00-01 | ac | 3 | 9 | 3 | 2/9/10 |
The first query selects the roots and sets the root values for the Level (1) and Path (ID). The next query joins the table with the CTE to match roots and children.
The question's cd
column isn't a path though. It looks like a row number inside each parent's direct children. Calculating row numbers is the job of the ROW_NUMBER function. Since we're counting inside a parent's children, we can use ROW_NUMBER() OVER(PARTITION BY Parent_ID ORDER BY ID)
.
with cte as (
select csx.* ,
1 as Level,
cast(category as varchar(200)) as Path
from csx
where parent_id is null
union all
select csx.* ,
cte.Level+1 as Level,
cast(CONCAT_WS('-',cte.Path, ROW_NUMBER() OVER(PARTITION BY csx.Parent_ID ORDER BY csx.ID)) as varchar(200)) As Path
from csx inner join cte on cte.ID=csx.parent_id
)
select * from cte
order by path;
This produces
id | cd | category | lvl | parent_id | Level | Path |
---|---|---|---|---|---|---|
1 | ab-00-00-00 | ab | 1 | NULL | 1 | ab |
2 | ac-00-00-00 | ac | 1 | NULL | 1 | ac |
3 | ac-01-00-00 | ac | 2 | 2 | 2 | ac-1 |
4 | ac-01-00-01 | ac | 3 | 3 | 3 | ac-1-1 |
5 | ac-01-00-02 | ac | 3 | 3 | 3 | ac-1-2 |
6 | ac-03-00-00 | ac | 2 | 2 | 2 | ac-2 |
7 | ac-03-00-01 | ac | 3 | 6 | 3 | ac-2-1 |
8 | ac-03-00-02 | ac | 3 | 6 | 3 | ac-2-2 |
9 | ac-02-00-00 | ac | 2 | 2 | 2 | ac-3 |
10 | ac-02-00-01 | ac | 3 | 9 | 3 | ac-3-1 |
Unfortunately, the values don't match. With ORDER BY ID
, the row with ID 6 will have a path ac-2
instead of ac-3
. Changing the order will break all other rows. There's no other indicator that could be used to determine the row number of the children, at least not in this table.
This means that either rows 6-10 are all wrong, or thatcd
can't be used to determine if Parent_ID
is wrong. It doesn't contain identifiers but calculated values. The only way to say if the data match these, is to try and reproduce them. Unfortunately, there's not enough information in the table to do so