Search code examples
sqljoingoogle-bigquerylegacy-sql

I want to join two tables with a common column in Big query?


To join the tables, I am using the following query.

SELECT *
FROM(select user as uservalue1 FROM [projectname.FullData_Edited]) as FullData_Edited 
JOIN (select user as uservalue2 FROM [projectname.InstallDate]) as InstallDate 
ON FullData_Edited.uservalue1=InstallDate.uservalue2;

The query works but the joined table only has two columns uservalue1 and uservalue2. I want to keep all the columns present in both the table. Any idea how to achieve that?


Solution

  • #legacySQL
    SELECT <list of fields to output>
    FROM [projectname:datasetname.FullData_Edited] AS FullData_Edited
    JOIN [projectname:datasetname.InstallDate] AS InstallDate
    ON FullData_Edited.user = InstallDate.user
    

    or (and preferable)

    #standardSQL
    SELECT <list of fields to output>
    FROM `projectname.datasetname.FullData_Edited` AS FullData_Edited
    JOIN `projectname.datasetname.InstallDate` AS InstallDate
    ON FullData_Edited.user = InstallDate.user
    

    Note, using SELECT * in such cases lead to Ambiguous column name error, so it is better to put explicit list of columns/fields you need to have in your output

    The way around it is in using USING() syntax as in example below.
    Assuming that user is the ONLY ambiguous field - it does the trick

    #standardSQL
    SELECT *
    FROM `projectname.datasetname.FullData_Edited` AS FullData_Edited
    JOIN `projectname.datasetname.InstallDate` AS InstallDate
    USING (user)
    

    For example:

    #standardSQL
    WITH `projectname.datasetname.FullData_Edited` AS (
      SELECT 1 user, 'a' field1
    ),
    `projectname.datasetname.InstallDate` AS (
      SELECT 1 user, 'b' field2
    )
    SELECT *
    FROM `projectname.datasetname.FullData_Edited` AS FullData_Edited
    JOIN `projectname.datasetname.InstallDate` AS InstallDate
    USING (user)
    

    returns

    user    field1  field2   
    1       a       b    
    

    whereas using ON FullData_Edited.user = InstallDate.user gives below error

    Error: Duplicate column names in the result are not supported. Found duplicate(s): user