Search code examples
mysqlhiveapache-spark-sqlhadoop2

Unsupported SubQuery Expression ''Fashion'': SubQuery expression refers to Outer query expressions only


I am Using below query :

select UserId, Category from customer_data 
where (Category in ('Fashion','Electronics'))
and (Action in ('Click','AddToCart','Purchase'))
and customer_data.UserId not in (select ustomer_data.UserId from customer_data where customer_data.Category='Fashion' and customer_data.Category='Electronics')  ;

Getting below error :

hive> Unsupported SubQuery Expression ''Fashion'': SubQuery expression refers to Outer query expressions only.

I am not sure about this error , do I need to use table name for each cloumn like customer_data.Category in outer query also ? Could you please help ?

Sample data :

UserId,ProductId,Category,Action
1,111,Electronics,Browse
2,112,Fashion,Click
3,113,Kids,AddtoCart
4,114,Food,Purchase
5,115,Books,Logout
6,114,Food,Click
7,113,Kids,AddtoCart
8,115,Books,Purchase
9,111,Electronics,Click
10,112,Fashion,Purchase
3,112,Fashion,Click
12,113,Kids,AddtoCart

desired output :

Output File
•   userID
•   category

Solution

  • Use analytic functions to calculate fashion_flag per UserId:

    select UserId, Category
    from
    ( --calculate User level flags
    select UserId, Category,
           max(fashion_flag)     over (partition by UserId) as user_fashion_flag,
           max(electronics_flag) over (partition by UserId) as user_electronics_flag
    from
    (--maybe you do not need this subquery, if case will work inside max() over
        select UserId, Category,
               case when Category='Fashion'     then 1 else 0 end fashion_flag,
               case when Category='Electronics' then 1 else 0 end electronics_flag
          from customer_data 
         where (Category in ('Fashion','Electronics'))
           and (Action in ('Click','AddToCart','Purchase'))
    ) s
    
    ) s
    where user_fashion_flag+user_electronics_flag=1 --not allow two flags at a time
    ;