I v got an org.apache.spark.sql.Dataframe = [t: double, S: long]
Now I want to reduce the Dataframe by every 2nd element, with val n=2
Result should be
How would u solve this problem?
I tried it by inserting a third column and using modulo, but I couldn’t solve it.
If i understand your question correctly, you want to keep every nth
element from your dataframe
and remove every other row. Assuming t
is not your row index
,add an index row and then filter it by:
import org.apache.spark.sql.expressions._
val n = 2
val filteredDF = df.withColumn("index", row_number().over(Window.orderBy(monotonically_increasing_id))).filter($"index" % n === 0)