I know that I should use Spark Datasets primarily, however I am wondering if there are good situations where I should use RDD
s instead of Datasets?
In a common Spark application you should go for the Dataset/Dataframe. Spark internally optimize those structure and they provide you high level APIs to manipulate the data. However there are situation when RDD are handy:
reduceByKey
, aggregateByKey
)