Search code examples
apache-flinkflink-streamingflink-sql

Can I use Flink's filesystem connector as lookup tables?


Flink 1.13.2 (Flink SQL) on Yarn.

A bit confused - I found two (as I understand) different specifications of Filesystem connector (Ververica.com vs ci.apache.org):

  1. https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/table/overview/#supported-connectors — Filesystem is "Bounded and Unbounded Scan, Lookup"

  2. https://docs.ververica.com/user_guide/sql_development/connectors.html#packaged-connectors — Only JDBC marked as usable for Lookup.

Can I use Filesystem connector (csv) for creating lookup (dimension) tables to enrich Kafka events table? If yes - how it's possible using Flink SQL?

(I've tried simple left joins with FOR SYSTEM_TIME AS OF a.event_datetime - it's works in test environment with small amount of Kafka events, but in production I get GC overhead limit exceeded error. I guess that's because of not broadcasting small csv tables to worker nodes. In Spark I used to solve these type problems using related hints.)


Solution

  • The lookup(dimension) table needs to implement the LookupTableSource interface, currently only hbase, jdbc, and hive are implemented in the Flink 1.3 version

    enter image description here