Search code examples
scalaprogramming-languageslanguage-design

Why did Scala's library double its size between 2.7 and 2.8?


Comparing Scala 2.7.7 (last 2.7.x release) with Scala 2.8.1 (latest 2.8.x release) I gathered the following metrics:

 Scala version        |    2.7.7          2.8.1                              
------------------------------------------------
 Compressed jar file  |   3.6 MB         6.2 MB   
 Uncompressed files   |   8.3 MB        16.5 MB
 .class files in .    |   1.8 MB         1.7 MB
   in ./actors        | 554.0 KB         1.3 MB      
   in ./annotation    |   962  B        11.7 KB 
   in ./collection    |   2.8 MB         8.8 MB
   in ./compat        |   3.8 3B         3.8 KB
   in ./concurrent    | 107.3 KB       228.0 KB
   in ./io            | 175.7 KB       210.6 KB
   in ./math          |    ---         337.5 KB
   in ./mobile        |  40.8 KB        47.3 KB
   in ./ref           |  21.8 KB        26.5 KB 
   in ./reflect       | 213.9 KB       940.5 KB
   in ./runtime       | 271.0 KB       338.9 KB
   in ./testing       |  47.1 KB        53.0 KB
   in ./text          |  27.6 KB        34.4 KB
   in ./util          |   1.6 MB         1.4 MB       
   in ./xml           | 738.9 KB         1.1 MB  

The biggest offenders are scala.collection (3.1 times bigger) and scala.reflect (4.4 times bigger). The increase in the collection package is in the same time frame as the big rewrite of the whole collection framework for 2.8, so I guess that's the cause.

I always assumed that the type system magic which computes the best return type of the collection class methods (which was the big change in 2.8) would be done at compile time and won't be visible after that.

  • Why did the rewrite result in such a big increase in size?

As far as I know it is planned to improve scala.io, scala.reflect and scala.swing, there are at least two other actor libraries doing the same than scala.actor (Lift actors) or a lot more (Akka) and scala.testing is officially already superseded by third party testing libraries.

  • Will an improved scala.io, scala.reflect or scala.swing result in a comparable size increase or was the case of scala.collection a really special circumstance?

  • Is it considered to delegate the actors implementation to Lift or Akka, if there will be an usable modularization system in JDK 8?

  • Are there plans to finally remove scala.testing or split it from the library jar-file?

  • Might the inclusion of SAM types, Defender Methods or MethodHandles in JDK7/JDK8 lead to a possibility of reducing the amount of classes the Scala compiler has to generate for anonymous/inner class/singletons/etc.?


Solution

  • I'm not in any way associated with the Scala project or any of the companies that support it. So take everything below as my own personal opinion·

    • Why did the rewrite result in such a big increase in size?

    Most likely, not the rewrite itself, but specialization. In particular, this definition of Function1:

    trait Function1[@specialized(scala.Int, scala.Long, scala.Float, scala.Double) -T1, @specialized(scala.Unit, scala.Boolean, scala.Int, scala.Float, scala.Long, scala.Double) +R]
    

    means all methods in Function1 will be implemented 35 times (one for each of Int, Long, Float, Double and AnyRef T1 times each Unit, Boolean, Int, Float, Long, Double and AnyRef R.

    Now, look at the Scaladoc and see known subclasses for Function1. I won't even bother copying it here. Also specialized where Function0 and Function2, though their impact is much smaller.

    If anything, I'd bet the rewrite decreased the final footprint, because of the extensive code reuse it enabled.

    As for reflect, it went from being almost non-existent to providing fundamental features to the new collection library, so it is no surprise it had a big relative increase.

    • Will an improved scala.io, scala.reflect or scala.swing result in a comparable size increase or was the case of scala.collection a really special circumstance?

    Not comparable, because the rewrite had nothing to do with it. However, a true scala.io library would certainly be much bigger than the little that exists nowadays, and I'd expect the same of a true reflection system for Scala (there have been papers about the latter). As for swing, I don't think there's much but incremental improvements to it, mostly wrappers around Java libraries, so I doubt it would change much in size.

    • Is it considered to delegate the actors implementation to Lift or Akka, if there will be an usable modularization system in JDK 8?

    Each implementation have their own strengths, and I haven't seen any signs of convergence for the time being. As for JDK 8, how is Scala supposed to be compatible with JDK 5 while modularizing for JDK 8? I don't mean it is not possible, but it is quite likely too much effort for the available resources.

    • Are there plans to finally remove scala.testing or split it from the library jar-file?

    It has been discussed, but there's also a concern about having some sort of testing framework available for the compiler itself, with the flexibility a third party testing framework would not provide. It might well be moved (or removed and replaced with something else) to the compiler jar instead, though.

    • Might the inclusion of SAM types, Defender Methods or MethodHandles in JDK7/JDK8 lead to a possibility of reducing the amount of classes the Scala compiler has to generate for anonymous/inner class/singletons/etc.?

    Sure, once no one else uses JDK5/JDK6 anymore. Of course, if JDK7/JDK8 get widespread adoption and the improvements are sufficiently worthwhile, then there might well come a time when Scala gets distributed with two distinct jar files for its library. But, at this point, it is too early to conjure up hypothetical scenarios.