A 'with cube' clause is generating duplicate rows, and I don't understand why. Here is an example:
CREATE TABLE venta_mes_hist (
tienda_id INT,
empresa_id INT,
dia_id DATE,
total_linea_brut DECIMAL(18, 2)
);
CREATE TABLE lk_tienda (
tienda_id INT,
empresa_id INT,
tienda_desc VARCHAR(50)
);
-- Insert data into lk_tienda
INSERT INTO lk_tienda (tienda_id, empresa_id, tienda_desc)
VALUES
(1, 1, 'SHIBORI'),
(2, 1, 'NARA'),
(3, 1, 'OSAKA');
-- Insert data into venta_mes_hist
INSERT INTO venta_mes_hist (tienda_id, empresa_id, dia_id, total_linea_brut)
VALUES
(1, 1, '2023-03-01', 100.00),
(1, 1, '2023-03-28', 2544.84),
(1, 1, '2023-04-15', 200.00),
(2, 1, '2023-03-01', 150.00),
(2, 1, '2023-03-28', 3000.00),
(2, 1, '2023-04-15', 250.00),
(3, 1, '2023-03-01', 200.00),
(3, 1, '2023-03-28', 3200.00),
(3, 1, '2023-04-15', 300.00),
(1, 1, '2023-05-10', 500.00);
Then I run this query:
DROP TABLE IF EXISTS #avenut;
SET DATEFORMAT mdy;
select cast(lk_tienda.tienda_desc as varchar(35)) Botiga,venta_mes_hist.dia_id Dia,cast(month(venta_mes_hist.dia_id) as int) Mes,
sum(venta_mes_hist.total_linea_brut) Venut_valor_brut
into #avenut
from lk_tienda, venta_mes_hist
where
lk_tienda.tienda_id = venta_mes_hist.tienda_id and lk_tienda.empresa_id = venta_mes_hist.empresa_id
and venta_mes_hist.empresa_id in ('1')
and venta_mes_hist.dia_id >= '03/01/2023'
and venta_mes_hist.dia_id <= '06/06/2024'
group by cast(lk_tienda.tienda_desc as varchar(35)), venta_mes_hist.dia_id, cast(month(venta_mes_hist.dia_id) as int)
with cube
order by cast(lk_tienda.tienda_desc as varchar(35)), venta_mes_hist.dia_id, cast(month(venta_mes_hist.dia_id) as int)
When I run:
SELECT * FROM #avenut
WHERE Botiga = 'SHIBORI' AND Dia = '2023-03-28';
It returns two rows:
Botiga Dia Mes Venut_valor_brut
SHIBORI 2023-03-28 3 2544.84
SHIBORI 2023-03-28 3 2544.84
How is this possible? Of course, I expect one row, as if I understand correctly, the WITH CUBE clause generates totals by inserting rows with null values in some fields.
If you remove the month column, it returns one row. And if you add a new year column while maintaining the month column, it returns 4 rows!
Your issue is that you have added cast(month(venta_mes_hist.dia_id as int))
as one of the group by
clauses. It firstly wasn't necessary anyway, as you have added venta_mes_hist.dia_id
as a clause also (so you just can do the calculation only in the select
), and it's also messing up the cube
which will multiply out all the clauses. So it's the equivalent in grouping sets
of:
group by grouping sets (
(cast(lk_tienda.tienda_desc as varchar(35)), venta_mes_hist.dia_id, cast(month(venta_mes_hist.dia_id) as int)),
(cast(lk_tienda.tienda_desc as varchar(35)), venta_mes_hist.dia_id),
(cast(lk_tienda.tienda_desc as varchar(35)), cast(month(venta_mes_hist.dia_id) as int)),
(venta_mes_hist.dia_id, cast(month(venta_mes_hist.dia_id) as int)),
(cast(lk_tienda.tienda_desc as varchar(35))),
(venta_mes_hist.dia_id),
(cast(month(venta_mes_hist.dia_id) as int)),
()
)
Just remove that clause from the group by
. It can also be removed from the order by
as it's functionally dependent on
with cube
is deprecated, use cube()
instead.cast(... as varchar(35))
seems unnecessary, you can remove that.select
lk_tienda.tienda_desc Botiga,
venta_mes_hist.dia_id Dia,
cast(month(venta_mes_hist.dia_id) as int) Mes,
sum(venta_mes_hist.total_linea_brut) Venut_valor_brut
from lk_tienda, venta_mes_hist
where
lk_tienda.tienda_id = venta_mes_hist.tienda_id and lk_tienda.empresa_id = venta_mes_hist.empresa_id
and venta_mes_hist.empresa_id in ('1')
and venta_mes_hist.dia_id >= '03/01/2023'
and venta_mes_hist.dia_id <= '06/06/2024'
group by cube (
lk_tienda.tienda_desc,
venta_mes_hist.dia_id
)
order by
lk_tienda.tienda_desc,
venta_mes_hist.dia_id;
You can also do this using grouping sets
, which some find to be clearer.
group by grouping sets (
(lk_tienda.tienda_desc, venta_mes_hist.dia_id),
(lk_tienda.tienda_desc),
(venta_mes_hist.dia_id),
()
)