Search code examples
dataframescalaapache-sparkapache-spark-sql

SQL Sub Query issue in Spark


Hi, I have a SQL query as below. When I run the same query in PostgreSQL query tool, it's working fine. But when I try it in Spark SQL, its showing the following error in spark logs

val SQLDF = spark.sqlContext.sql("""
select m.v_number, o.u_id, r_number, v_name, p.pr_name
from
    PG_DF_VHM m,
    OBU_VIN_DF o,
    PROJ_NAME_DF p
where
    m.v_number = o.v_number
    and p.pr_id in (
        select pr_id
        from PROJ_USER_DF
        where
            user_id in (
                select user_id
                from PG_MAP_DF
                where
                    project_user_id in (
                        select project_user_id
                        from PROJ_USER_DF u
                        where
                            u.user_id = o.created_by
                    )
                    and user_role_id = 3
                limit 1
            )
    )
""")
ERROR ApplicationMaster: User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve '`o.created_by`' given input columns:24/03/11 06:46:51 

ERROR ApplicationMaster: User class threw exception: org.apache.spark.sql.AnalysisException: 
cannot resolve '`o.created_by`' 
given input columns: [
u.project_id, 
u.created_by, 
u.user_id, 
u.project_user_id, 
u.updated_timestamp, 
u.is_deleted, 
u.updated_by, 
u.created_timestamp
]; line 1 pos 33

Solution

  • from PG_DF_VHM m, OBU_VIN_DF o, PROJ_NAME_DF p is wrong. Here use some common column to join . for eg id is common from PG_DF_VHM m join OBU_VIN_DF o on m.id=o.id join PROJ_NAME_DF p on m.id =p.id

    Something like that