Search code examples
ruby-on-railsrubyscalaapache-sparkchaining

Scala equivalent for ActiveSupport's Object.try in Ruby


The try method is a common extension to core Ruby. For example, it's available by default in Rails. try executes a method or a block of code on an object only if it is not nil (Ruby's null). It's usage comes in three flavors:

  1. Calling a method on a non-nil object and returning the result:
    customer_or_nil.try(:save).

  2. Chaining an arbitrary block of code in an expression if the result so far is not nil:
    obj.try { |non_nil_obj| do_something(non_nil_obj) }.

  3. An extension of (2) used not for optional processing but as a way to continue a chained expression where the method in step n+1 requires an argument that must be calculated based on the result of step n:
    data.analyze.try { |result| result.compress(optimal_settings(result)) }.save

I am specifically interested in the Scala equivalent of (3) or an alternative Scala idiom that would allow, for example, this code related to Apache Spark's DataFrame:

val df = ctx.sql("select * from my_table")
df.
  repartition(max(1, df.rdd.partitions.size / 4)).
  saveAsTable("repartitioned_table")

to be refactored into something like the following (using Ruby syntax)

ctx.
  sql("select * from my_table").
  try { |df| df.repartition(max(1, df.rdd.partitions.size / 4)) }.
  saveAsTable("repartitioned_table")

The goal of the refactoring is to improve readability by maintaining a single method chain and reduce scope pollution by keeping df tightly scoped to the step in the chain where it is absolutely needed.

Note: I am specifically not interested in discussions of the pros and cons of using Option for optional processing as this is not the main use case of try this question is concerned with.


Solution

  • Not in the standard library, but an equivalent to 3 is easy to write (try is a keyword, so renamed to ap (short for apply)):

    implicit class TryOp[A](x: A) {
      def ap[B](f: A => B): B = f(x)
    }
    
    ctx.
      sql("select * from my_table").
      ap { df => df.repartition(max(1, df.rdd.partitions.size / 4)) }.
      saveAsTable("repartitioned_table")