Search code examples
sql-servert-sqldatabricksazure-databricksdatabricks-sql

Refactoring T-SQL to Databricks SQL to return results from Subquery


I have the following T-SQL that when I attempt to execute with Databricks SQL I get the error

Error in SQL statement: AnalysisException: cannot resolve 'rn'

I have a view called db.myview and it would appear Databricks SQL is unable to execute the sub-query for rn (at least that is what I think)

SELECT * 
FROM
(select 
    rn = ROW_NUMBER() OVER (PARTITION BY ID order by ID),
*
    from db.myview c
) c
where c.rn = 1

When I execute the above query in SQL Server I have no problems.


Solution

  • Spark SQL requires the column alias to come after the column expression.

    named_expression

    An expression with an assigned name. In general, it denotes a column expression.

    Syntax: expression [[AS] alias]

    https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select.html

    So

    SELECT * 
    FROM
    (
        select ROW_NUMBER() OVER (PARTITION BY C_NAME order by C_CUSTKEY) as rn, *
        from CUSTOMER c
    ) c
    where c.rn = 1