I have created an Apache Ni-Fi data pipeline that fetches data from a MySQL table and after some data transformation loads the data into a postgreSQL table. The first processor is a GenerateTableFetch which is followed by an ExecuteSQL processor.
The pipeline works perfectly with source tables containing over 14 million entries (around 2700 MB). However, with tables containing over 420 million entries (around 69000 MB), the GenerateTableFetch processor does not provide any output.
I have used a partition size of 100000 rows and I am using as Column for Value Partitioning the column id which is the primary key in the source table. The same column is also used as Maximimum-value Columns. The Max Wait Time is set to zero, allowing also long SQL queries.
The attached figure shows the other properties of the processor.
In the ExecuteSQL processor I have configured the Max Rows Per Flow File to the same value as the partitioning size: 100000.
I am running Ni-Fi 1.18.0 (java 11.0.20.1) on Linux Ubuntu 22.04.
Any hint why this is happening and what are the maximum size limits for the source table used in the GenerateTableFetch processor? And how I would manage to load such a huge table with Ni-Fi?
Thanks,
Bernardo
By running GenerateTableFetch in DEBUG mode I noticed that it was searching for entries whose id was bigger that the max. The processor has been used with the same source table in the testing phase and this was the reason why it was not producing any output from that table.
I have cleared the state of the processor and now it is working as expected.