scala apache-spark apache-spark-sql apache-zeppelin

Reduce size of Spark Dataframe by selecting only every n th element with Scala

I v got an org.apache.spark.sql.Dataframe = [t: double, S: long]

enter image description here

Now I want to reduce the Dataframe by every 2nd element, with val n=2

Result should be

enter image description here

How would u solve this problem?

I tried it by inserting a third column and using modulo, but I couldn’t solve it.

Solution

If i understand your question correctly, you want to keep every nth element from your dataframe and remove every other row. Assuming t is not your row index,add an index row and then filter it by:

import org.apache.spark.sql.expressions._

val n = 2
val filteredDF = df.withColumn("index", row_number().over(Window.orderBy(monotonically_increasing_id))).filter($"index" % n === 0)

Why is reference to overloaded definition ambiguous when types are known?
How to construct Dataframe from a Excel (xls,xlsx) file in Scala Spark?
Pattern matching in conjunciton with filter
Spark The deserializer is not supported: need a(n) "ARRAY" field but got "MAP<STRING, STRING>"
Difference between ReduceByKey and CombineByKey in Spark
Requesting more information about @inline from the compiler?
Scala Class Name via TypeTag is simply written as "TypeTagImpl"
How do I test that an offset has been committed or not to Kafka
How to exclude cucumber tags
Jvm takes a long time to resolve ip-address for localhost
How to define partitioning of DataFrame?
Why is the creation of multi-dimensional arrays so slow in Scala?
How to pass along CSRF token in an AJAX post request for a form?
How can two coupled Scala generic type constructors refer to each other as type parameters?
Read CSV with "§" as delimiter using Databricks autoloader
OAuth 1.0 and X API v2 with Scala and sttp
Scala problem reading parameter values inside List Classes
Scala Akka HTTP issue accessing post request parameters wrapped in Option[String]
akka 2.0 send message to self
Scala Akka HTTP problem accessing post request parameters
Problems with Java Vector API to sum a list of doubles
Spark Send DataFrame as body of HTTP Post request
Mix Lombok, Java and Scala in a maven project
Best way to turn a List of Eithers into an Either of a List?
Why scala compiler says that this type is used in non-specializable position?
Understand how to use apply and unapply
Scala fold function to detect occurrence of a value in Seq[String]
How to generate Verilog rather than SystemVerilog from Chisel?
How to specify indentations on multiline parameter lists in IntelliJ Scala?
Scala GroupBy preserving insertion order?