Search code examples
scalacollectionstail-recursion

How to improve this "update" function?


Suppose I've got case class A(x: Int, s: String) and need to update a List[A] using a Map[Int, String] like that:

def update(as: List[A], map: Map[Int, String]): List[A] = ???

val as = List(A(1, "a"), A(2, "b"), A(3, "c"), A(4, "d"))
val map = Map(2 -> "b1", 4 -> "d1", 5 -> "e", 6 -> "f")
update(as, map) // List(A(1, "a"), A(2, "b1"), A(3, "c"), A(4, "d1"))

I am writing update like that:

def update(as: List[A], map: Map[Int, String]): List[A] = {

  @annotation.tailrec
  def loop(acc: List[A], rest: List[A], map: Map[Int, String]): List[A] = rest match {
    case Nil => acc
    case as => as.span(a => !map.contains(a.x)) match {
      case (xs, Nil) => xs ++ acc
      case (xs, y :: ys) => loop((y.copy(s = map(y.x)) +: xs) ++ acc, ys, map - y.x)
    }
  }

  loop(Nil, as, map).reverse
}

This function works fine but it's suboptimal because it continues iterating over the input list when map is empty. Besides, it looks overcomplicated. How would you suggest improve this update function ?


Solution

  • If you can not make any supposition about the List and the Map. Then the best is to just iterate the former, juts once and in the simplest way possible; that is, using the map function.

    list.map { a =>
      map
        .get(key = a.x)
        .fold(ifEmpty = a) { s =>
          a.copy(s = s)
       }
    }
    

    However, if and only if, you can be sure that most of the time:

    1. The List will be big.
    2. The Map will be small.
    3. The keys in the Map are a subset of the values in the List.
    4. And all operations will be closer to the head of the List rather than the tail.

    Then, you could use the following approach which should be more efficient in such cases.

    def optimizedUpdate(data: List[A], updates: Map[Int, String]): List[A] = {
      @annotation.tailrec
      def loop(remaining: List[A], map: Map[Int, String], acc: List[A]): List[A] =
        if (map.isEmpty) acc reverse_::: remaining
        else remaining match {
          case a :: as =>
            map.get(key = a.x) match {
              case None =>
                loop(
                  remaining = as,
                  map,
                  a :: acc
                )
              
              case Some(s) =>
                loop(
                  remaining = as,
                  map = map - a.x,
                  a.copy(s = s) :: acc
                )
            }
          
          case Nil =>
            acc.reverse
        }
      
      loop(remaining = data, map = updates, acc = List.empty)
    }
    

    However note that the code is not only longer and more difficult to understand.
    It is actually more inefficient than the map solution (if the conditions are not meet); this is because the stdlib implementation "cheats" and constructs the List my mutating its tail instead of building it backwards and then reversing it as we did.

    In any case, as with any things performance, the only real answer is to benchmark.
    But, I would go with the map solution just for clarity or with a mutable approach if you really need speed.


    You can see the code running here.