Search code examples
scalascala-collectionsscala-compiler

Complexity of mapping identity function to a Scala collection?


When I apply the Scala predefined identity function to a collection using the map method the original collection is returned unchanged. However, is the compiler smart enough to simply return the unchanged collection as an O(1) operation? Or will the identity function still be applied to each element in the original collection yielding a O(n) operation?


Solution

  • It is pretty straightforward to check that this is not the case. Start by making a test file with possibly optimized forms and compile it using scalac (with or without -optimise)

    /// TestMap.scala
    object TestMap {
      def mapOption[T](o: Option[T]): Option[T] = o.map(identity)
      def mapList[T](l: List[T]): List[T] = l.map(identity)
      def mapSeq[T](l: Seq[T]): Seq[T] = l.map(identity)
    }
    

    Then, checking javap -c TestMap.class you can see that nothing got optimized past specializing map to mapSeq, mapList, or mapOption:

    Compiled from "TestMap.scala"
    public final class TestMap {
      public static <T extends java/lang/Object> scala.collection.Seq<T> mapSeq(scala.collection.Seq<T>);
        Code:
           0: getstatic     #16                 // Field TestMap$.MODULE$:LTestMap$;
           3: aload_0       
           4: invokevirtual #18                 // Method TestMap$.mapSeq:(Lscala/collection/Seq;)Lscala/collection/Seq;
           7: areturn       
    
      public static <T extends java/lang/Object> scala.collection.immutable.List<T> mapList(scala.collection.immutable.List<T>);
        Code:
           0: getstatic     #16                 // Field TestMap$.MODULE$:LTestMap$;
           3: aload_0       
           4: invokevirtual #22                 // Method TestMap$.mapList:(Lscala/collection/immutable/List;)Lscala/collection/immutable/List;
           7: areturn       
    
      public static <T extends java/lang/Object> scala.Option<T> mapOption(scala.Option<T>);
        Code:
           0: getstatic     #16                 // Field TestMap$.MODULE$:LTestMap$;
           3: aload_0       
           4: invokevirtual #26                 // Method TestMap$.mapOption:(Lscala/Option;)Lscala/Option;
           7: areturn  
    

    More simply, this sort of optimization just doesn't extend well in languages with side-effects (on the other hand, in Haskell, this sort of thing happens regularly). Should the compiler optimise l.map(x => { println(x); x }) to l or not for example?