Search code examples
scalamachine-learningapache-flink

Apache Flink ALS with ids in Long instead of Int


I am trying the code of ALS in Flink version 1.1.3 using:

mvn archetype:generate                             \
    -DarchetypeGroupId=org.apache.flink            \
    -DarchetypeArtifactId=flink-quickstart-scala   \
    -DarchetypeVersion=1.1.3                       \
    -DgroupId=org.apache.flink.quickstart          \
    -DartifactId=flink-scala-project               \
    -Dversion=0.1                                  \
    -Dpackage=org.apache.flink.quickstart          \
    -DinteractiveMode=false

I am following the example code in: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/libs/ml/als.html and changed the Int for the Long in the Dataset

val env = ExecutionEnvironment.getExecutionEnvironment
val csvInput: DataSet[(Long, Long, Double)] = env.readCsvFile[(Long, Long, Double)]("tmp-contactos.csv")

// Setup the ALS learner
val als = ALS()
  .setIterations(10)
  .setNumFactors(10)
  .setBlocks(100)


// Set the other parameters via a parameter map
val parameters = ParameterMap()
  .add(ALS.Lambda, 0.9)
  .add(ALS.Seed, 42L)

// Calculate the factorization
als.fit(csvInput, parameters)

But it throws in runetime:

Exception in thread "main" java.lang.RuntimeException: There is no FitOperation defined for org.apache.flink.ml.recommendation.ALS which trains on a DataSet[(Long, Int, Double)]
at org.apache.flink.ml.pipeline.Estimator$$anon$4.fit(Estimator.scala:85)
at org.apache.flink.ml.pipeline.Estimator$class.fit(Estimator.scala:55)
at org.apache.flink.ml.recommendation.ALS.fit(ALS.scala:122)
at org.apache.flink.quickstart.BatchJob$.main(BatchJob.scala:119)
at org.apache.flink.quickstart.BatchJob.main(BatchJob.scala)

It is posible to use Longs instead of Ints??

I searched and found this for the 0.9 version but nothing for 1.1.13:
https://issues.apache.org/jira/browse/FLINK-2211


Solution

  • So far it is not officially supported but I've created a branch where I've fixed this limitation. You can try out this branch. I'll contribute it to Flink so that it should become part of the master in the next time.