Search code examples
Why are the results of RDD.getNumPartitions and RDD.mapPartitions different?...

apache-sparkrdd

Read More
Spark pairRDD vs Dataframe for join optimization...

dataframeapache-sparkjoinrdd

Read More
How to remove header by using filter function in spark?...

scalaapache-sparkrdd

Read More
spark rdd, need to reduce over (key,(tuple))...

scalaapache-sparkrdd

Read More
Would Spark unpersist the RDD itself when it realizes it won't be used anymore?...

apache-sparkhadooprdddistributed-computing

Read More
Number of partitions in RDD and performance in Spark...

performanceapache-sparkpysparkrdd

Read More
Spark Scala convert RDD with Case Class to simple RDD...

scalaapache-sparkrddcase-class

Read More
PySpark RDD: Manipulating Inner Array...

apache-sparkpysparkrdd

Read More
is there a PySpark function that will merge data from a column for rows with same id?...

dataframeapache-sparkpysparkrdd

Read More
How to get data from a specific partition in Spark RDD?...

apache-sparkrdd

Read More
RDD foreach method provides no results...

apache-sparkforeachpysparkrdd

Read More
Summary of a Column (Achieving a Cube Function on Spark Dataset)...

scalaapache-sparkapache-spark-sqlrddapache-spark-dataset

Read More
Splitting a text file based on empty lines in Spark...

pythonapache-sparkpysparkapache-spark-sqlrdd

Read More
Sorting an rdd after using groupbykey using values...

javaapache-sparkgroup-byrdd

Read More
Pyspark reduce function causes StackOverflowError...

apache-sparkpysparkapache-spark-sqlrdd

Read More
RDD get type and index of each element...

javaapache-sparkrddmapper

Read More
How to map filenames to RDD using sc.textFile("s3n://bucket/*.csv")?...

amazon-s3mappingapache-sparkfilenamesrdd

Read More
Use SparkContext hadoop configuration within RDD methods/closures, like foreachPartition...

javahadoopapache-sparkrdd

Read More
convert html to json using rdd.map...

pysparkxml-parsinghtml-parsingrdd

Read More
Spark read file from S3 using sc.textFile ("s3n://...)...

javascalaapache-sparkrddhortonworks-data-platform

Read More
How to calculate average by category in pyspark streaming?...

pythonpysparkspark-streamingrdddstream

Read More
Spark dataframe map root key with elements of array of another column of string type...

scalaapache-sparkapache-spark-sqlrddarray-map

Read More
How to find end position of a column's value in another column in Pyspark?...

dataframeapache-sparkpysparkdatabricksrdd

Read More
How is fault tolerance achieved when there is no data replication in spark?...

apache-sparkrdddirected-acyclic-graphs

Read More
Find elements in one RDD but not in ther other RDD...

javaapache-sparkmapreducerdd

Read More
How to iterate a RDD and remove the field if it exist in a list using PySpark...

python-3.xdictionarypysparkrowrdd

Read More
How to convert PySpark dataframe to dictionary: first column as main key, the other columns and thei...

pythonapache-sparkdictionarypysparkrdd

Read More
RDD to DataFrame in spark and scala...

dataframescalaapache-sparkrdd

Read More
Group and merge RDD pair keys and values...

apache-sparkpysparkapache-spark-sqlrddkey-value

Read More
pyspark.createDataFrame(rdd, schema) returns just null values...

apache-sparkpysparkapache-spark-sqlrdd

Read More
BackNext