I have stumbled across a bit of an issue with U-SQL which for me is a problem I haven't yet found a workaround for.
It seems U-SQL doesnt support anything else but == in joins, so you can't put > or < in the join itself.
For the use case below which I have done in oracle:
create table trf.test_1(
number_col int
);
insert into trf.test_1 VALUES (10);
insert into trf.test_1 VALUES (20);
insert into trf.test_1 VALUES (30);
insert into trf.test_1 VALUES (60);
drop table trf.test_2;
create table trf.test_2(
number_col int
);
insert into trf.test_2 VALUES (20);
insert into trf.test_2 VALUES (30);
SELECT t1.number_col, t2.number_col
FROM trf.test_1 t1
LEFT JOIN trf.test_2 t2 ON t1.number_col < t2.number_col
;
I get the following:
How might I do that in u-sql without the < join?
I tried a cross join, but if you include the < in the where clause it just turns into an inner and you don't get the rows with the nulls.
Any ideas appreciated.
@t1 =
SELECT * FROM
( VALUES
(10),
(20),
(30),
(60)
) AS T(num_col);
@t2 =
SELECT * FROM
( VALUES
(20),
(30)
) AS T(num_col);
@result =
SELECT t1.num_col, t2.num_col AS num_col_2
FROM @t1 AS t1
CROSS JOIN @t2 AS t2
WHERE t1.num_col < t2.num_col;
@result2 =
SELECT t1.num_col, t2.num_col AS num_col_2
FROM @t1 AS t1
LEFT JOIN @result AS t2 ON t1.num_col == t2.num_col;
OUTPUT @result2
TO "/Output/ReferenceGuide/Joins/exampleA.csv"
USING Outputters.Csv();
Edit - I added the left join from the @t1 dataset back to the @result set which seems to work but would be interested if there are any better solutions out there. Seems a bit of a work around.
This is a known feature and discussed extensively in the article "U-SQL SELECT Selecting from joins".
Some quotes from that article:
Join Comparisons
U-SQL, like most scaled out Big Data Query languages that support joins, restricts the join comparison to equality comparisons between columns in the rowsets to be joined...
...
If one has a non-equality comparison or a more complex expression (such as a method invocation) in the comparison, one can move the comparison to the SELECT’s WHERE clause. Or the more complex expression can be placed in an earlier SELECT statement’s column and then that alias can be referred to in the join comparison.
Basically they don't scale particularly well on a distributed platform like ADLA.