Search code examples
scalaobjecttemplatesgenericscontext-bound

What Does a Variable Defined before a Scala Function Mean?


Learning Scala from the Scala for Data Science book and the companion Github repo, here I am particularly talking about this function, copied below for reference.

    def fromList[T: ClassTag](index: Int, converter: String => T): DenseVector[T] =
      DenseVector.tabulate(lines.size) { row => converter(splitLines(row)(index)) }

What does the DenseVector.tabulate(lines.size) mean between the = sign and the function body definition? New to scala (with background from python and C++), so cannot figure out if that DenseVector.tabulate(lines.size) is a local variable of the function being defined (when it should be declared inside the definition) or something else? It cannot be the return type, from what I understand of scala syntax.

Also, is the ClassTag equivalent to template in C++?

To help you answer the question,

  • splitLines has type scala.collection.immutable.Vector[Array[String]]
  • lines.size is an unsigned int (obvious, but still making it clear)

Solution

  • DenseVector.tabulate is a factory function (defined on the companion object of DenseVector) that has two parameter lists with one parameter each (so altogether, it takes two explicit parameters: size: Int and a function f: Int => V).

    You can find its definition here (as part of the breeze library).

    In (pseudo-)C++ (ignoring the ClassTag), the corresponding declaration would probably look something like this:

    template<classname V>
    class DenseVector {
    public:
        // ... other class members
    
        template<classname V>
        static DenseVector<V> tabulate(int size, std::function<V(int)> f);
    };
    

    and then fromList would probably look something like this:

    template<classname T>
    static DenseVector<T> fromList(int index, std::function<T(std::string)> converter) {
        return DenseVector::tabulate(lines.size, [&converter](int row){
            return converter(splitLines(row)[index]);
        });
    }