Search code examples
scalacollectionsfoldleft

Scala FoldLeft function


I have below sample data:

Day,JD,Month,Year,PRCP(in),SNOW(in),TAVE (F),TMAX (F),TMIN (F) 
1,335,12,1895,0,0,12,26,-2 
2,336,12,1895,0,0,-3,11,-16 
.
.
.

Now I need to calculate hottest day having maximm TMAX, now I have calculated it with reduceBy, but couldn't figure out how to do it with foldBy below is the code:

    import scala.io.Source

case class TempData(day:Int , DayOfYear:Int , month:Int , year:Int ,
                    precip:Double , snow:Double , tave:Double, tmax:Double, tmin:Double)
object TempData {
 def main(args:Array[String]) : Unit = {
   val source = Source.fromFile("C:///DataResearch/SparkScala/MN212142_9392.csv.txt")
   val lines = source.getLines().drop(1)
   val data = lines.flatMap { line =>
     val p = line.split(",")
    TempData(p(0).toInt, p(1).toInt, p(2).toInt, p(4).toInt
         , p(5).toDouble, p(6).toDouble, p(7).toDouble, p(8).toDouble, p(9).toDouble))
   }.toArray
   source.close()
   
   val HottestDay = data.maxBy(_.tmax)
   println(s"Hot day 1 is $HottestDay")

   val HottestDay2 = data.reduceLeft((d1, d2) => if (d1.tmax >= d2.tmax) d1 else d2)
   println(s"Hot day 2 is $HottestDay2")

   val HottestDay3 = data.foldLeft(0.0,0.0).....
   println(s"Hot day 3 is $HottestDay3")

I cannot figure out how to use foldBy function in this.


Solution

  • foldLeft is a more general reduceLeft (it does not require the result to be a supertype of the collection type and it allows one to define the value if there's nothing to fold over). One can implement reduceLeft in terms of foldLeft like so:

    def reduceLeft[B >: A](op: (B, A) => B): B = {
      if (data.isEmpty) throw new UnsupportedOperationException("empty collection")
      else this.tail.foldLeft(this.head)(op)
    }
    

    Applying that transformation, assuming that data is not empty, you can thus translate

    data.reduceLeft((d1, d2) => if (d1.tmax >= d2.tmax) d1 else d2)
    

    into

    data.tail.foldLeft(data.head) { (d1, d2) =>
      if (d1.tmax >= d2.tmax) d1
      else d2
    }
    

    If data has size 1, then data.tail is empty and the result is data.head (which is trivially the maximum).