I have created one bucketed table on timeslot column which has value from 0 to 23 and datatype of timeslot column is int
I have created 24 buckets and load 10000000 rows (6GB of data) in the bucketed table
At the same time i created a normal non-bucketed table using same dataset
later I queried on bucketed table as well as non-bucketed table like as below
select * from bucketed_table where timeslot = 15;
select * from non-bucketed_table where timeslot = 15;
both the queries are taking almost same time
I was assuming bucketed table perform far better than non-bucketed table
can anyone let me know if i am doing something wrong or my assumption is completely wrong?
As per my understanding bucketed table only performed better in case of joining with other bucketed table. if we just query on bucketed column there will not be any performance gain as in this case both bucketed table and non-bucketed table scan whole table (data files) and that is why in both the cases same number of mapper are launched