Search code examples
apache-beambeam-sql

How to output nested Row from Beam SQL (SqlTransform)?


I want to have Row with nested Row from output of Beam SQL (SqlTransform), but failing.

Questions:

  1. What is the proper way to output Row with nested Row from SqlTransform? (Row type is described in the docs, so I believe it's supported)
  2. If this is a bug/missing feature, is the problem of Beam itself? Or runner-dependent? (I'm currently using on DirectRunner, but going to use DataflowRunner in future.)

Version info:

  • OS: macOS 10.15.7 (Catalina)
  • Java: 11.0.11 (AdoptOpenJDK)
  • Beam SDK: 2.32.0

Here's what I've tried, with no luck.

With Calcite dialect

SELECT ROW(foo, bar) as my_nested_row FROM PCOLLECTION

I was expecting this outputs row with following schema

Field{name=my_nested_row, description=, type=ROW<foo STRING NOT NULL, bar INT64 NOT NULL> NOT NULL, options={{}}}

but actually row is split into scalar fields like

Field{name=my_nested_row$$0, description=, type=STRING NOT NULL, options={{}}}
Field{name=my_nested_row$$1, description=, type=INT64 NOT NULL, options={{}}}

Zeta SQL

SELECT STRUCT(foo, bar) as my_nested_row FROM PCOLLECTION

I got an error

java.lang.UnsupportedOperationException: Does not support expr node kind RESOLVED_MAKE_STRUCT
    at org.apache.beam.sdk.extensions.sql.zetasql.translation.ExpressionConverter.convertRexNodeFromResolvedExpr (ExpressionConverter.java:363)
    at org.apache.beam.sdk.extensions.sql.zetasql.translation.ExpressionConverter.convertRexNodeFromResolvedExpr (ExpressionConverter.java:323)
    at org.apache.beam.sdk.extensions.sql.zetasql.translation.ExpressionConverter.convertRexNodeFromComputedColumnWithFieldList (ExpressionConverter.java:375)
    at org.apache.beam.sdk.extensions.sql.zetasql.translation.ExpressionConverter.retrieveRexNode (ExpressionConverter.java:203)
    at org.apache.beam.sdk.extensions.sql.zetasql.translation.ProjectScanConverter.convert (ProjectScanConverter.java:45)
    at org.apache.beam.sdk.extensions.sql.zetasql.translation.ProjectScanConverter.convert (ProjectScanConverter.java:29)
    at org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertNode (QueryStatementConverter.java:102)
    at org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convert (QueryStatementConverter.java:89)
    at org.apache.beam.sdk.extensions.sql.zetasql.translation.QueryStatementConverter.convertRootQuery (QueryStatementConverter.java:55)
    at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel (ZetaSQLPlannerImpl.java:98)
    at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal (ZetaSQLQueryPlanner.java:197)
    at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel (ZetaSQLQueryPlanner.java:185)
    at org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery (BeamSqlEnv.java:111)
    at org.apache.beam.sdk.extensions.sql.SqlTransform.expand (SqlTransform.java:171)
    at org.apache.beam.sdk.extensions.sql.SqlTransform.expand (SqlTransform.java:109)
    at org.apache.beam.sdk.Pipeline.applyInternal (Pipeline.java:548)
    at org.apache.beam.sdk.Pipeline.applyTransform (Pipeline.java:482)
    at org.apache.beam.sdk.values.PCollection.apply (PCollection.java:363)
    at dev.tmshn.playbeam.Main.main (Main.java:29)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:566)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
    at java.lang.Thread.run (Thread.java:829)

Solution

  • Unfortunately Beam SQL does not yet support nested rows, mainly due to a lack of support in Calcite (and therefore a corresponding lack of support for the ZetaSQL implementation). See this similar question focused on Dataflow.

    On the bright side, the Jira issue tracking this support seems to be resolved for 2.34.0, so proper support is likely upcoming.