I need some help.
I have table № 1:
id | val1 | val2 |
---|---|---|
0 | 15 | 17 |
1 | 5 | 6 |
.... | ... | ... |
I have table № 2:
id_another | val1 | val2 | val3 |
---|---|---|---|
0 | 15 | 17 | 100 |
1 | 5 | 6 | 200 |
I want to get:
id | val1 | val2 | belongs_to_id_another_0 | belongs_to_id_another_1 | is_outlier |
---|---|---|---|---|---|
0 | 15 | 17 | 1 | NULL | NULL |
1 | 5 | 6 | NULL | NULL | 1 |
.... | ... | ... | ... | ... |
Explanation: For each entity in Table 1 I should apply function-comparison with every entity in Table 2 (Size order for Table 1 and Table 2 is ~1000:1, so the time complexity would not be problem). The function is:
CREATE OR REPLACE FUNCTION is_loc_inside_cls (
loc_longitude FLOAT,
loc_latitude FLOAT,
cls_longitude FLOAT,
cls_latitude FLOAT,
cls_radius FLOAT
)
RETURNS INT AS $$
BEGIN
IF (2 * 6378137.0 * ASIN(SQRT(SIN((loc_latitude * PI() / 180.0 - cls_latitude * PI() / 180.0) / 2) * SIN((loc_latitude * PI() / 180.0 - cls_latitude * PI() / 180.0) / 2) + (COS(loc_latitude * PI() / 180.0 ) * COS(cls_latitude * PI() / 180.0)) * SIN((loc_longitude * PI() / 180.0 - cls_longitude * PI() / 180.0) / 2) * SIN((loc_longitude * PI() / 180.0 - cls_longitude * PI() / 180.0) / 2)))) * (2 * 6378137.0 * ASIN(SQRT(SIN((loc_latitude * PI() / 180.0 - cls_latitude * PI() / 180.0) / 2) * SIN((loc_latitude * PI() / 180.0 - cls_latitude * PI() / 180.0) / 2) + (COS(loc_latitude * PI() / 180.0 ) * COS(cls_latitude * PI() / 180.0)) * SIN((loc_longitude * PI() / 180.0 - cls_longitude * PI() / 180.0) / 2) * SIN((loc_longitude * PI() / 180.0 - cls_longitude * PI() / 180.0) / 2)))) <= cls_radius * cls_radius THEN
RETURN 1;
ELSE
RETURN NULL;
END IF;
END; $$
LANGUAGE PLPGSQL;
I then create as many columns as entities in Table 2 and I place there the return. is_outlier value == 1 if in these new columns the NULL value is everywhere, == NULL otherwise. Question is how I can apply this logic? Couldn't find any valid examples.
I can't think of the part where I should implement FOR LOOP logic, I managed to implement only function
Before addressing the question, the following are a few points regarding general code improvement:
The following refactors is_loc_inside_cls to return a boolean. The IF
statement has been removed from the original version of the function and the result is directly returned. This version of the function returns NULL to indicate that the condition is unknown if any of the arguments are NULL; otherwise, it returns either TRUE or FALSE.
CREATE OR REPLACE FUNCTION is_loc_inside_cls (loc_longitude float, loc_latitude float, cls_longitude float, cls_latitude float, cls_radius float)
RETURNS boolean
AS $$
BEGIN
RETURN (2 * 6378137.0 * ASIN(SQRT(SIN((loc_latitude * PI() / 180.0 - cls_latitude * PI() / 180.0) / 2) * SIN((loc_latitude * PI()
/ 180.0 - cls_latitude * PI() / 180.0) / 2) + (COS(loc_latitude * PI() / 180.0) * COS(cls_latitude * PI() / 180.0)) *
SIN((loc_longitude * PI() / 180.0 - cls_longitude * PI() / 180.0) / 2) * SIN((loc_longitude * PI() / 180.0 - cls_longitude *
PI() / 180.0) / 2)))) * (2 * 6378137.0 * ASIN(SQRT(SIN((loc_latitude * PI() / 180.0 - cls_latitude * PI() / 180.0) / 2) *
SIN((loc_latitude * PI() / 180.0 - cls_latitude * PI() / 180.0) / 2) + (COS(loc_latitude * PI() / 180.0) * COS(cls_latitude *
PI() / 180.0)) * SIN((loc_longitude * PI() / 180.0 - cls_longitude * PI() / 180.0) / 2) * SIN((loc_longitude * PI() / 180.0 -
cls_longitude * PI() / 180.0) / 2)))) <= cls_radius * cls_radius;
END;
$$
LANGUAGE PLPGSQL;
For the following query, I've changed the column names from val1, val2, and val3 to latitude, longitude, and radius, respectively. This query uses the refactored version of is_loc_inside_cls, so belongs_to_id_another_0 and belongs_to_id_another_1 might each be TRUE, FALSE, or NULL. If either belongs_to_id_another_0 or belongs_to_id_another_1 is TRUE, then is_outlier will be FALSE regardless of the other's value. If both are FALSE, then is_outlier will be TRUE. If neither are TRUE and either or both are NULL, then is_outlier will also be NULL, indicating that the state could not be determined.
SELECT
table_1.id,
table_1.latitude,
table_1.longitude,
bool_and(
CASE table_2.id
WHEN 0 THEN
t.is_inside_cls
END) AS belongs_to_id_another_0,
bool_and(
CASE table_2.id
WHEN 1 THEN
t.is_inside_cls
END) AS belongs_to_id_another_1,
bool_and(NOT t.is_inside_cls)
OR nullif (bool_or(t.is_inside_cls IS NULL), TRUE) AS is_outlier
FROM
table_1
CROSS JOIN table_2
JOIN LATERAL (
VALUES (is_loc_inside_cls (table_1.latitude, table_1.longitude, table_2.latitude, table_2.longitude, table_2.radius))) t (is_inside_cls) ON TRUE
GROUP BY
table_1.id,
table_1.latitude,
table_1.longitude
ORDER BY
table_1.id;
This query can easily be extended to support additional rows by copying the pattern for the belongs_to_id_another_* columns. The logic for is_outlier is not dependent on the number of rows in table_2.
If it is available in the target environment, consider using the crosstab function that is provided by the tablefunc PostgreSQL extension. If the tablefunc extension is not available and there is a need to support an arbitrary number of rows from table_2, then create a function to dynamically generate the SQL and then execute the resulting query.