I need to load an Hive table using spark-sql and then to run some machine-learning algho on that. I do that writing:
val dataSet = sqlContext.sql(" select * from table")
It works well, but If I wanted to increase number of partions of the dataSet Dataframe, How can I could do that? With normal RDD I can do writing:
val dataSet = sc.textFile(" .... ", N )
With N number of partitions I want to have.
Thanks
You can coalesce
or repartition
the resulting DataFrame
, i.e.:
val dataSet = sqlContext.sql(" select * from table").coalesce(N)