Search code examples
Filtering RDD based on column values...


apache-sparkpysparkapache-spark-sqlrdd

Read More
Creating data frame out of sequence using toDF method in Apache Spark...


scalaapache-sparkapache-spark-sqlrdd

Read More
Creating rdd from another rdd with required specific columns...


apache-sparkpysparkapache-spark-sqlrdd

Read More
Getting some error while performing map/reduce on pyspark RDD...


pythonapache-sparkpysparkrdd

Read More
In python 3.5.2, how to elegantly chain an unknown quantity of functions on an object than changes t...


pythonapache-sparkpysparkrdd

Read More
Spark: rdd.count() and rdd.write() are executing transformations twice...


apache-sparkrdd

Read More
getPersistentRDDs returns Map of cached RDDs and DataFrames in Spark 2.2.0, but in Spark 2.4.7 - it ...


scalaapache-sparkrdd

Read More
get filename and file modification/creation time as (key, value) pair into RDD using pyspark...


pythonfileapache-sparkpysparkrdd

Read More
Check Type: How to check if something is a RDD or a DataFrame?...


pythonapache-sparkdataframeapache-spark-sqlrdd

Read More
pyspark - retrieve first element of rdd - top(1) vs. first()...


apache-sparkpysparkrdd

Read More
Filter an rdd depending on values of a second rdd...


pythonapache-sparkpysparkrdd

Read More
Reading JSON RDD using Spark Scala...


jsonscalaapache-sparkrdd

Read More
How to make two columns from 1 column while dividing data between them in spark?...


scalaapache-sparkapache-spark-sqlrddcase-when

Read More
How to read a delimited file using Spark RDD, if the actual data is embedded with same delimiter...


pythonapache-sparkpysparkrdd

Read More
How to show top N number of results with customization in spark rdd?...


sortingapache-sparkpysparkrdd

Read More
how to select and count the each individual words from file?...


scalaapache-sparkcountrdddistinct-values

Read More
How to type hint a function that transforms an RDD?...


pysparkrdd

Read More
Why is huge data shuffling in Spark when using union()/coalesce(1,false) on DataFrame?...


apache-sparkapache-spark-sqlrddshuffle

Read More
Spark Core How to fetch max n rows of an RDD function without using Rdd.max()...


apache-sparkpysparkrdd

Read More
Big numpy array to spark dataframe...


numpyapache-sparkpysparkapache-spark-sqlrdd

Read More
how to get the value from a key,value form a map reduce job in scala...


scalaapache-sparkmapreducerdd

Read More
How to execute multiple scripts on Spark?...


multithreadingapache-sparkpysparkmultiprocessingrdd

Read More
need instance of RDD but returned class 'pyspark.rdd.PipelinedRDD'...


pythonapache-sparkapache-spark-sqlrdd

Read More
Comma separated data in rdd (pyspark) indices out of bound problem...


pythonrddpyspark

Read More
Convert a simple one line string to RDD in Spark...


pythonapache-sparkpysparkdistributed-computingrdd

Read More
Transforming RDD[String] to RDD[myclass]...


scalaapache-sparkrdd

Read More
PySpark count groupby with None keys...


pythonapache-sparkpysparkrdd

Read More
Spark RDD double compare error: value > is not a member of (Double, Double)...


scalaapache-sparkrdd

Read More
Spark RDD Partition effects...


apache-sparkrddpartitioning

Read More
Operation on normalVectorRDD...


scalaapache-sparkvectorrddnormal-distribution

Read More
BackNext