Search code examples
apache-sparkapache-spark-sqlhadoop-partitioning

Spark-SQl DataFrame partitions


I need to load an Hive table using spark-sql and then to run some machine-learning algho on that. I do that writing:

val dataSet = sqlContext.sql(" select * from table")

It works well, but If I wanted to increase number of partions of the dataSet Dataframe, How can I could do that? With normal RDD I can do writing:

val dataSet = sc.textFile(" .... ", N )

With N number of partitions I want to have.

Thanks


Solution

  • You can coalesce or repartition the resulting DataFrame, i.e.:

    val dataSet = sqlContext.sql(" select * from table").coalesce(N)