Say I have some data like:
grp v1 v2
--- -- --
2 5 7
2 4 9
3 10 2
3 11 1
I'd like to create new columns which are independent of the ordering of the table - such that the two columns have independent orderings, i.e. sort by v1 independently of v2, while partitioning by grp.
The result (independently ordered, partitioned by grp) would be:
grp v1 v2 v1_ordered v2_ordered
--- -- -- ---------- ----------
2 5 7 4 7
2 4 9 5 9
3 10 2 10 1
3 11 1 11 2
One way to do this is to create two tables and CROSS JOIN. However, I'm working with too many rows of data for this to be computationally tractable - is there a way to do this within a single query without a JOIN?
Basically, I'd like to write SQL like:
SELECT
*,
v1 OVER (PARTITION BY grp ORDER BY v1 ASC) as v1_ordered,
v2 OVER (PARTITION BY grp ORDER BY v2 ASC) as v2_ordered
FROM [example_table]
This breaks table row meaning, but it's a necessary feature for many applications - for example computing ordered correlation between two fields CORR(v1_ordered, v2_ordered).
Is this possible?
I think you are in right direction! You just need to use proper window function . Row_number() in this case. And it should work!
Adding working example as per @cgn request:
I dont think there is way to totally avoid use of JOIN.
At the same time below example uses just ONE JOIN vs TWO JOINs in other answers:
SELECT
a.grp AS grp,
a.v1 AS v1,
a.v2 AS v2,
a.v1 AS v1_ordered,
b.v2 AS v2_ordered
FROM (
SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v1) AS v1_order
FROM [example_table]
) AS a
JOIN (
SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v2) AS v2_order
FROM [example_table]
) AS b
ON a.grp = b.grp AND a.v1_order = b.v2_order
Result is as expected:
grp v1 v2 v1_ordered v2_ordered
2 4 9 4 7
2 5 7 5 9
3 10 2 10 1
3 11 1 11 2
And now you can use CORR() as below
SELECT grp, CORR(v1_ordered, v2_ordered) AS [corr]
FROM (
SELECT
a.grp AS grp,
a.v1 AS v1,
a.v2 AS v2,
a.v1 AS v1_ordered,
b.v2 AS v2_ordered
FROM (
SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v1) AS v1_order
FROM [example_table]
) AS a
JOIN (
SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v2) AS v2_order
FROM [example_table]
) AS b
ON a.grp = b.grp AND a.v1_order = b.v2_order
)
GROUP BY grp