Im completely new to Datawarehouse, OLAP and hive. I do have a single csv file containing training data about a online market like ebay (see Column Data). My task is to create a star schema in hive (via Data Analytics Studio, Spark, or whatever).
I scaffolded the dimension as followed:
How can i create those dimensions tables and especially generate a new ID for each line?
My SELECT New_Guid(), listing_title, listing_subtitle, listing_type_code, start_price, buy_it_now_price, buy_it_now_listed_flag, qty_available_per_listing From auctions
, which i found on another tutorial. But New_Guid()
does not work at all in my Data Analytics Studio.
Thanks a lot!
Assuming no history in the dimensions, and leaving aside if good dimension design or not:
For each required Dimension:
select (row_number() over()), Col1, Col2, col3, col4 from temp_table
& persist to dimension_tableFor the fact table:
select (row_number() over()), Col1, Col2, col3, col4 from temp_table
and store in another temp_table_2