sql postgresql amazon-web-services concatenation amazon-redshift

Amazon RSQL concat of two tables with 2 shared columns

I have such an example of tables:

table1:

col 1	col2	col3	col4
1	a	x1	asdc
2	b	x2	czxa
3	c	x3	xfsdaa

table2:

col2	col3
l	x56
q	x99

All I want is to receive such a final concatenation

col 1	col2	col3	col4
1	a	x1	asdc
2	b	x2	czxa
3	c	x3	xfsdaa
null	l	x56	null
null	q	x99	null

How can I make it in RSQL (postgresql based SQL)? I thought about selecting all columns from the first table, then union all and use NULL AS col1 [...] NULL AS col4 in the place of missing columns, but in real data I have dozens of columns so the code won't look great with it.

Is there some other way to achieve it?

Solution

If you don't mind reordered columns you can get away with a natural full outer join:

USING clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s). It takes a comma-separated list of the shared column names and forms a join condition that includes an equality comparison for each one. For example, joining T1 and T2 with USING (a, b) produces the join condition ON T1.a = T2.a AND T1.b = T2.b.

NATURAL is a shorthand form of USING: it forms a USING list consisting of all column names that appear in both input tables. As with USING, these columns appear only once in the output table. If there are no common column names, NATURAL JOIN behaves like JOIN ... ON TRUE, producing a cross-product join.

Which means it'll deduplicate columns with matching names, without you having to name them or even know them. Demo at db<>fiddle:

select * from table1 natural full outer join table2;

col2	col3	col1	col4
a	x1	1	asdc
b	x2	2	czxa
c	x3	3	xfsdaa
l	x56	null	null
q	x99	null	null

Problem is, matching values in columns with matching names will be joined together. It might as well be the desired effect but if it's not, you can force the mismatch using the fact that joining on null values results in a mismatch:

select * 
from (select tableoid::regclass,null as force_mismatch,* from table1) a
natural full outer join
     (select tableoid::regclass,null as force_mismatch,* from table2) b;

tableoid	force_mismatch	col2	col3	col1	col4
table1	null	a	x1	1	asdc
table1	null	b	x2	2	czxa
table1	null	c	x3	3	xfsdaa
table2	null	c	x3	null	null
table2	null	l	x56	null	null
table2	null	q	x99	null	null

I've added tableoid system column only to show (c, x3) was in both tables. It potentially offers some added utility and you can force the mismatch using just that, but otherwise it's enough to add a constantly null column with the same name on both sides.

Keep in mind that natural join is a sensitive topic: it's risky and typically advised against.

That being said, what you proposed with the UNION clause and null constants wherever you want to skip a field is the most obvious, but it should also perform better: demo

create table table1(col1, col2, col3, col4) as values
 (1,    'a',    'x1',   'asdc')
,(2,    'b',    'x2',   'czxa')
,(3,    'c',    'x3',   'xfsdaa');

create table table2(col2,col3) as values
 ('l',  'x56')
,('q',  'x99');

select col1, col2, col3, col4 from table1
union all
select null, col2, col3, null from table2;

col1	col2	col3	col4
1	a	x1	asdc
2	b	x2	czxa
3	c	x3	xfsdaa
null	l	x56	null
null	q	x99	null