Search code examples
sqlhadoophivehiveql

hive join tables and create 2 different tables


I am using hive .14 for a school project (so cant use a better version), and currently I analysing data taken from a uber in NY. I am using 2 tables: yellowtaxi22, for the information regarding the trip the uber did (in this case I will only use the pickupid and the dropoff id); and zoneinfo, that simply have the names of the pickup and dropoff id. I created a query to find what where the most common trips the uber took and then I created a view out of it(this being the commontrip), the problem now is that when I want to change the ids to their corresponding names, it only selects the trips where the pickupid=dropid. Example

pulocationid dolocationid count
12 34 20000
43 12 30000
12 12 100
5 91 40000
34 34 70000

the result should be

pulocationid dolocationid count
JFK airport NV 20000
Midtown JFK airport 30000
JFK airport JFK airport 100
Mountain Hill LINCON SQR 40000
NV NV 70000

But the result is

pulocationid dolocationid count
JFK airport JFK airport 100
NV NV 70000

this is the query that I am using, zoneinfo is the table that contains the location´s name (refered as zone) and its id (whcih can be used in either the pickup or the dropoff column (or both), common trips is the name of the view that I used to count the number of times a trip was repeated, this count is named as "trips" in the query:

select /*+ Mapjoin(commontrip)*/ zoneinfo.zone as pickup, zoneinfo.zone as dropoff, commontrip.trips 
from zoneinfo join commontrip on
(zoneinfo.locationid=commontrip.pulocationid and zoneinfo.locationid=commontrip.dolocationid);

Solution

  • You're comparing zoneinfo.locationid to both commontrip.pulocationidand commontrip.dolocationid, meaning those must be equal with each other as well.

    Try joining only one ID column at a time with your "label table"