Aggregating sum for RDD in Scala (Spark)

If I have a variable such as books: RDD[(String, Integer, Integer)], how do I want to merge keys with the same String (could represent title), and then sum the corresponding two integers (could represent pages and price).

ex:

[("book1", 20, 10),
 ("book2", 5, 10),
 ("book1", 100, 100)]

becomes

[("book1", 120, 110),
 ("book2", 5, 10)]

Solution

With an RDD you can use reduceByKey.

case class Book(name: String, i: Int, j: Int) {
  def +(b: Book) = if(name == b.name) Book(name, i + b.i, j + b.j) else throw Exception
}

val rdd = sc.parallelize(Seq(
   Book("book1", 20, 10), 
   Book("book2",5,10), 
   Book("book1",100,100)))

val aggRdd = rdd.map(book => (book.name, book))
   .reduceByKey(_+_) // reduce calling our defined `+` function
   .map(_._2)        // we don't need the tuple anymore, just get the Books

aggRdd.foreach(println)
// Book(book1,120,110)
// Book(book2,5,10)

Spark Send DataFrame as body of HTTP Post request
Mix Lombok, Java and Scala in a maven project
Best way to turn a List of Eithers into an Either of a List?
Why scala compiler says that this type is used in non-specializable position?
Understand how to use apply and unapply
Scala fold function to detect occurrence of a value in Seq[String]
How to generate Verilog rather than SystemVerilog from Chisel?
How to specify indentations on multiline parameter lists in IntelliJ Scala?
Scala GroupBy preserving insertion order?
Scala - complex conditional pattern matching
Difference between Array and List in scala
What is the recommended way to setup integration test in sbt after the deprecation of custom configs
Scala - List("a","b").map(fn) versus map(fn, List("a","b")
Ideal Scala's compiler and libraries using Play framework
Stream-Static Join: How to refresh (unpersist/persist) static Dataframe periodically
How to make IntelliJ IDEA use javac for Java and scalac for Scala?
What's the difference between == and .equals in Scala?
Why AWS is rejecting my connections when I am using wholeTextFiles() with pyspark?
How do I use a nightly build of Scala?
Run only one package test cases scala
Is `finally` block executed in case there is `return` inside a `try` or a `catch` block?
What Scala version is used in maven Scala plugin by default?
Difference between trait inheritance and self type annotation
Scala: checking if an object is Numeric
How to write spark streaming DF to Kafka topic
Per EntityType Active Entity Limit in Cluster Sharding Passivation
What are the Spark transformations that causes a Shuffle?
Scala closures filename
How to customize serialization of a key of a Map with jsoniter
Scala 3: typed tuple zipping