Search code examples
sqloracleansi-sql

Using the same table alias twice in a query


My coworker, who is new to ANSI join syntax, recently wrote a query like this:

SELECT count(*)
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b)
       JOIN table3 t3 ON
             (t3.col_c = t1.col_c);

Note that table3 is joined to both table1 and table2 on different columns, but the two JOIN clauses use the same table alias for table3.

The query runs, but I'm unsure of it's validity. Is this a valid way of writing this query?

I thought the join should be like this:

SELECT count(*)
  FROM table1 t1
       JOIN table2 t2 ON
            (t1.col_a = t2.col_a)
       JOIN table3 t3 ON
            (t2.col_b = t3.col_b AND
             t3.col_c = t1.col_c);

Are the two versions functionally identical? I don't really have enough data in our database yet to be sure.

Thanks.


Solution

  • The first query is a join of 4 tables, the second one is a join of 3 tables. So I don't expect that both queries return the same numbers of rows.

    SELECT *
      FROM table1 t1
           JOIN table2 t2 ON
                (t1.col_a = t2.col_a)
           JOIN table3 t3 ON
                (t2.col_b = t3.col_b)
           JOIN table3 t3 ON
                 (t3.col_c = t1.col_c);
                 
    

    The alias t3 is only used in the ON clause. The alias t3 refers to the table before the ON keyword. I found this out by experimenting. So the pervious query is equvivalent to

    SELECT *
      FROM table1 t1
           JOIN table2 t2 ON
                (t1.col_a = t2.col_a)
           JOIN table3 t3 ON
                (t2.col_b = t3.col_b)
           JOIN table3 t4 ON
                 (t4.col_c = t1.col_c);
    

    and this can be transfotmed in a traditional join

    SELECT *
      FROM table1 t1,
           table2 t2,
           table3 t3,
           table3 t4
    where (t1.col_a = t2.col_a)
        and  (t2.col_b = t3.col_b)
        and (t4.col_c = t1.col_c);
        
    

    The second query is

    SELECT *
      FROM table1 t1
           JOIN table2 t2 ON
                (t1.col_a = t2.col_a)
           JOIN table3 t3 ON
                (t2.col_b = t3.col_b AND
                 t3.col_c = t1.col_c);
                 
    

    This can also transformed in a traditional join

    SELECT *
      FROM table1 t1,
        table2 t2,
        table3 t3
    where (t1.col_a = t2.col_a)
        and (t2.col_b = t3.col_b)
        AND (t3.col_c = t1.col_c);
        
    

    These queries seem to be different. To proof their difference we use the following example:

    create table table1(
        col_a number,
        col_c number
    );
    
    create table table2(
        col_a number,
        col_b number
    );
    
    create table table3(
        col_b number,
        col_c number
    );
    
    insert into table1(col_a, col_c) values(1,3);
    insert into table1(col_a, col_c) values(4,3);
    insert into table2(col_a, col_b) values(1,2);
    insert into table2(col_a, col_b) values(4,2);
    insert into table3(col_b, col_c) values(2,3);
    insert into table3(col_b, col_c) values(2,5);
    insert into table3(col_b, col_c) values(7,9);
    
    commit;
    

    We get the following output

    SELECT *
      FROM table1 t1
           JOIN table2 t2 ON
                (t1.col_a = t2.col_a)
           JOIN table3 t3 ON
                (t2.col_b = t3.col_b)
           JOIN table3 t3 ON
                 (t3.col_c = t1.col_c)
    
    
    | COL_A | COL_C | COL_A | COL_B | COL_B | COL_C | COL_B | COL_C |
    |-------|-------|-------|-------|-------|-------|-------|-------|
    |     1 |     3 |     1 |     2 |     2 |     3 |     2 |     3 |
    |     4 |     3 |     4 |     2 |     2 |     3 |     2 |     3 |
    |     1 |     3 |     1 |     2 |     2 |     5 |     2 |     3 |
    |     4 |     3 |     4 |     2 |     2 |     5 |     2 |     3 |
    
    
    
                 
    SELECT *
      FROM table1 t1
           JOIN table2 t2 ON
                (t1.col_a = t2.col_a)
           JOIN table3 t3 ON
                (t2.col_b = t3.col_b AND
                 t3.col_c = t1.col_c)
    
    | COL_A | COL_C | COL_A | COL_B | COL_B | COL_C |
    |-------|-------|-------|-------|-------|-------|
    |     4 |     3 |     4 |     2 |     2 |     3 |
    |     1 |     3 |     1 |     2 |     2 |     3 |
    

    The number of rows retrieved is different and so count(*) is different.

    The usage of the aliases was surprising. at least for me.

    The following query works because t1 in the where_clause references table2.

    select *
    from table1 t1 join table2 t1 on(1=1)
    where t1.col_b<0;
    

    The following query works because t1 in the where_clause references table1.

    select *
    from table1 t1 join table2 t1 on(1=1)
    where t1.col_c<0;
    

    The following query raises an error because both table1 and table2 contain a column col_a.

    select *
    from table1 t1 join table2 t1 on(1=1)
    where t1.col_a<0;
    

    The error thrown is

    ORA-00918: column ambiguously defined
    

    The following query works, the alias t1 refers to two different tables in the same where_clause.

    select *
    from table1 t1 join table2 t1 on(1=1)
    where t1.col_b<0 and t1.col_c<0;
    

    These and more examples can be found here: http://sqlfiddle.com/#!4/84feb/12

    The smallest counter example

    The smallest counter example is

    table1
    col_a  col_c
        1      2
    
    table2
    col_a  col_b
        1      3
    
    table3
    col_b  col_c
        3      5
        6      2    
    

    Here the second query has an empty result set and the first query returns one row. It can be shown that the count(*) of the second query never exeeds the count(*)of the first query.

    A more detailed explanation

    This behaviour will became more clear if we analyze the following statement in detail.

    SELECT t.col_b, t.col_c
      FROM table1 t
           JOIN table2 t ON
                (t.col_b = t.col_c) ;
            
    

    Here is the reduced syntax for this query in Backus–Naur form derived from the syntax descriptions in the SQL Language Reference of Oracle 12.2. Note that under each syntax diagram there is a link to the Backus–Naur form of this diagram, e.g Description of the illustration select.eps. "reduced" means that I left out all the possibilities that where not used, e,g. the select is defined as

    select::=subquery [ for_update_clause ] ;
    

    Our query does not use the optional for_update_clause, so I reduced the rule to

    select::=subquery
    

    The only exemption is the optional where-clause. I didn't remove it so that this reduced rules can be used to analyze the above query even if we add a where_clause.

    These reduced rule will define only a subset of all possible select statements.

    select::=subquery 
    subquery::=query_block
    query_block::=SELECT select_list FROM join_clause [ where_clause ]
    join_clause::=table_reference  inner_cross_join_clause ...  
    table_reference::=query_table_expression  t_alias query_table_expression::=table
    inner_cross_join_clause::=JOIN table_reference ON condition
    

    So our select statement is a query_block and the join_clause is of type

    table_reference inner_cross_join_clause
    

    where table_reference is table1 t and inner_cross_join_clause is JOIN table2 t ON (t.col_b = t.col_c). The ellipsis ... means that there could be additional inner_cross_join_clauses, but we do not need this here.

    in the inner_cross_join_clause the alias t refers to table2. Only if these references cannot be satisfied the aliasmust be searched in an outer scope. So all the following expressions in the ONcondition are valid:

    t.col_b = t.col_c
    

    Here t.col_b is table2.col_b because t refers to the alias of its inner_cross_join_clause, t.col_c is table1.col_c. t of the inner_cross_join_clause (refering to table2) has no column col_c so the outer scope will be searched and an appropriate alias will be found.

    If we have the clause

    t.col_a = t.col_a
    

    the alias can be found as alias defined in the inner_cross_join_clause to which this ON-condition belongs so t will be resolved to table2.

    if the select list consists of

    t.col_c, t.col_b, t.col_a
    

    instead of * then the join_clause will be searched for an alias and t.col_c will be resolved to table1.col_c (table2 does not contain a column col_c), t.col_b will be resolved to table2.col_b (table1 does not contain a col_b) but t.col_a will raise the error

    ORA-00918: column ambiguously defined
    

    because for the select_list none of the aias definition has a precedenve over the other. If our query also has a where_clause then the aliases are resolved in the same way as if they are used in the select_list.