I have the following SQL (Impala) pseudo query, since it wont compile this way. The interesting part is the last, where I want to do exactly what you can read.
I want to do a LEFT JOIN but in case there is no matching ProductId I want to use a specific ProductId (which is NULL and suppose there is only one but guaranteeing it by using LIMIT 1) and perform a JOIN-like connection so the above conditions in the CASE-WHEN would work properly.
So basically the question is that if there is a way to transform this syntactically incorrect query to a single correct one?
I was trying different things using e.g. ISNULL() and WITH, but since the subquery which you can see in the ELSE part has to use 2 tables to work properly it can not be compiled anyway it would work in my opinion.
SELECT
cd.CycleDataId AS CycleDataId,
CASE
WHEN cd.CycleTime >= op.IdealValue AND cd.CycleTime <= op.MaxValue THEN NVL(dcl.ProductionLossTypeId, -1)
WHEN cd.CycleTime >= op.IdealValue AND cd.CycleTime >= op.MaxValue THEN dcl.ProductionLossTypeId
END AS Verdikt,
CASE
WHEN cd.CycleTime >= op.IdealValue AND cd.CycleTime <= op.MaxValue THEN NVL(dcl.Time, cd.CycleTime - op.IdealValue)
WHEN cd.CycleTime >= op.IdealValue AND cd.CycleTime >= op.MaxValue THEN dcl.Time
END AS Time
FROM CycleData cd
LEFT JOIN DistributedCycleLosses dcl ON dcl.CycleDataId = cd.CycleDataId
CASE
WHEN IF EXISTS(SELECT * FROM Operation_parameter WHERE ProductId = cd.ProductId AND cd.Timestamp_ BETWEEN ValidFrom AND ValidTo) THEN LEFT JOIN Operation_parameter op ON op.ProductId = cd.ProductId AND cd.Timestamp_ BETWEEN op.ValidFrom AND op.ValidTo
ELSE (SELECT * FROM Operation_parameter WHERE ProductId IS NULL AND cd.Timestamp_ BETWEEN ValidFrom AND ValidTo LIMIT 1) AS op
END;
This is basically using a default value. I think this does what you want:
SELECT cd.CycleDataId AS CycleDataId,
(CASE WHEN cd.CycleTime >= COALESCE(op.IdealValue, opnull.IdealValue) AND
cd.CycleTime <= COALESCE(op.MaxValue, opnull.MaxValue)
THEN COALESCE(dcl.ProductionLossTypeId, -1)
WHEN cd.CycleTime >= COALESCE(op.IdealValue, opnull.IdealValue) AND
COALESCE(op.MaxValue, opnull.MaxValue)
THEN dcl.ProductionLossTypeId
END) AS Verdikt,
(CASE WHEN cd.CycleTime >= COALESCE(op.IdealValue, opnull.IdealValue) AND
cd.CycleTime <= COALESCE(op.MaxValue, opnull.MaxValue)
THEN cd.CycleTime >= COALESCE(dcl.Time, cd.CycleTime - COALESCE(op.IdealValue, opnull.IdealValue))
WHEN cd.CycleTime >= COALESCE(op.IdealValue, opnull.IdealValue) AND
cd.CycleTime >= COALESCE(op.MaxValue, opnull.MaxValue)
THEN dcl.Time
END) AS Time
FROM CycleData cd LEFT JOIN
DistributedCycleLosses dcl
ON dcl.CycleDataId = cd.CycleDataId LEFT JOIN
Operation_parameter op
ON op.ProductId = cd.ProductId AND
cd.Timestamp_ BETWEEN op.ValidFrom AND op.ValidTo LEFT JOIN
Operation_parameter opnull
ON op.ProductId IS NULL AND -- no previous match
opnull.ProductID IS NULL AND
cd.Timestamp_ BETWEEN opnull.ValidFrom AND opnull.ValidTo ;
Note that all references to op
are replaced with COALESCE()
expressions.
You can modify this to handle multiple rows for matching NULL
values, if that is really necessary. I think the more important part of the logic are the LEFT JOIN
s.