Search code examples
sqlscalajoinapache-flinkflink-sql

JOIN single table flink tableapi by two columns


I have a table with data and I need to make a join by two fields.

I wrote a request, but it does not work

SELECT * 
FROM Data t1 
JOIN Data t2 ON t1.s = t2.o

the code is

val csvTableSource = CsvTableSource
  .builder
  .path("src/main/resources/data.dat")
  .field("s", Types.STRING)
  .field("p", Types.STRING)
  .field("o", Types.STRING)
  .field("TIMESTAMP", Types.STRING)
  .fieldDelimiter(",")
  .ignoreFirstLine
  .ignoreParseErrors
  .commentPrefix("%")
  .build()
tableEnv.registerTableSource("Data", csvTableSource)

val query = "SELECT * FROM Data t1 JOIN Data t2 ON t1.s = t2.o"
val table = tableEnv.sqlQuery(query)

I get the following exception

Exception in thread "main" org.apache.flink.table.api.TableException: Cannot generate a valid execution plan for the given query: 

FlinkLogicalJoin(condition=[=($0, $6)], joinType=[inner])
  FlinkLogicalTableSourceScan(table=[[Data]], fields=[s, p, o, TIMESTAMP], source=[CsvTableSource(read fields: s, p, o, TIMESTAMP)])
  FlinkLogicalTableSourceScan(table=[[Data]], fields=[s, p, o, TIMESTAMP], source=[CsvTableSource(read fields: s, p, o, TIMESTAMP)])

This exception indicates that the query uses an unsupported SQL feature.
Please check the documentation for the set of currently supported SQL features.

Solution

  • I guess, you are trying to run this query in a streaming environment. Non-windowed joins on streaming tables were added with Flink 1.5.0.

    So you are trying to use a feature that is not supported in Flink 1.4.2 yet.

    You can either switch to a batch environment which should be possible given that you are reading CSV files or upgrade to Flink 1.5.0.