Search code examples
scalagoogle-bigqueryreflectionapache-beamspotify-scio

Create Scala case classes for SCIO type safe read/write dynamically


I'm trying to generate Scala classes dynamically, which then should be used as safe types in SCIO read/write to/from GCP BigQuery.

Target example:

import com.spotify.scio.bigquery._
import com.spotify.scio.bigquery.types.BigQueryType

@BigQueryType.fromTable("dataset.SOURCE_TABLE")
class SOURCE_TABLE

@BigQueryType.toTable
case class TARGET_TABLE(id: String, name: String, desc: String)

def main(cmdlineArgs: Array[String]): Unit = {
  val (sc, args) = ContextAndArgs(cmdlineArgs)
  sc.typedBigQuery[SOURCE_TABLE]()  // Read from BQ
    .map( row => transformation(row) ) // Transform -> SCollection[TARGET_TABLE]
    .saveAsTypedBigQueryTable(Table.Spec(args("TARGET_TABLE")))  // save to BQ
  sc.run()
  ()
}

As as input there are dataset, SOURCE_TABLE, TARGET_TABLE, list of target fields, so I can build up a string source of generated classes. All these values are retrieved dynamically from other 3rd party (json, xml, etc. ) and can be mutable by every execution.

So, the source of generated classes can be presented as:

val sourceString =
  s"""
     |import com.spotify.scio.bigquery.types.BigQueryType
     |
     |@BigQuery.fromTable("$dataset.$SOURCE_TABLE")
     |class $SOURCE_TABLE
     |
   """.stripMargin

val targetString =
  s"""
     |import com.spotify.scio.bigquery.types.BigQueryType
     |
     |@BigQueryType.toTable
     |case class $TARGET_TABLE($fieldDefinitions)
   """.stripMargin

These sources are considered to be translated to classes, which types are required for SCIO BigQuery I/O.

Scala version: 2.12.17

I tried to use Scala runtime Mirror and Toolbox (from this answer, from this, etc.). But all variants throw the same error: enable macro paradise (2.12) or -Ymacro-annotations (2.13) to expand macro annotations It's obvious that the Toolbox's internal compiler doesn't see the build.sbt settings:

addCompilerPlugin("org.scalamacros" % "paradise" % "2.1.1" cross CrossVersion.full)

Besides that here mentioned that Toolbox is not intended for such complex thing.

So, I decided to apply an approach using the package scala.tools.nsc as described in this answer. But it throws the same error about the lack of macro annotations.

Thus the main question: is there any chance to add required compiler plugin settings to scala.tools.nsc.{Global, Settings} or to apply any other approach to generate such annotated classes dynamically?

def compileCode(sources: List[String], classpathDirectories: List[AbstractFile], outputDirectory: AbstractFile): Unit = {
  val settings = new Settings
  classpathDirectories.foreach(dir => settings.classpath.prepend(dir.toString))
  settings.outputDirs.setSingleOutput(outputDirectory)
  settings.usejavacp.value = true
  //*****
  // Add macros paradise compiler plugin?
  //*****
  val global = new Global(settings)
  val files = sources.zipWithIndex.map { case (code, i) => new BatchSourceFile(s"(inline-$i)", code) }
  (new global.Run).compileSources(files)
}

Solution

  • You can switch on the paradise plugin for toolbox by feeding corresponding command-line option

    import scala.reflect.runtime
    import scala.tools.reflect.ToolBox
    val rm = runtime.currentMirror
    
    val tb = rm.mkToolBox(options = "-Xplugin:/path/to/.cache/coursier/v1/https/repo1.maven.org/maven2/org/scalamacros/paradise_2.12.18/2.1.1/paradise_2.12.18-2.1.1.jar")
    

    in Scala 2.12 or

    val tb = rm.mkToolBox(options = "-Ymacro-annotations")
    

    in Scala 2.13.

    If you prefer actual compiler over toolbox then you can feed the option in the following way

    settings.plugin.value = List("/path/to/.cache/coursier/v1/https/repo1.maven.org/maven2/org/scalamacros/paradise_2.12.18/2.1.1/paradise_2.12.18-2.1.1.jar")
    

    in Scala 2.12 or

    settings.YmacroAnnotations.value = true
    

    in Scala 2.13.