The simple source read from postgres table(get 3 columns out of 20 columns) is taking huge time to read which I want to read to stream lookup where I fetch one column information
Here is the log:
2020/05/15 07:56:03 - load_identifications - Step **Srclkp_Individuals.0** ended successfully, processed 4869591 lines. ( 7632 lines/s)
2020/05/15 07:56:03 - load_identifications - Step LookupIndiv.0 ended successfully, processed 9754378 lines. ( 15288 lines/s)
The table input query is:
SELECT
id as INDIVIDUAL_ID,
org_ext_loc
FROM
individuals
This table is in postgres with 20 columns hardly & about 4.8 million rows..
This is for pentaho 7.1 data integration & server details below
**Our server information**:
OS : Oracle Linux 7.3
RAM : 65707 MB
HDD Capacity : 2 Terabytes
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 16
CPU MHz: 2294.614
I am connecting to postgres using jdbc
Don't know what else I can do to get about 15K rows/sec throughput
When you use a table input & stream lookup, the way how pentaho runs the stream lookup is slower than when you use a database lookup. As @nsousa suggested, I checked that with dummy step and got to know that pentaho's way of handling is different for every type of step
Even though database lookup & stream lookup come in same category, the performance for database lookup is better in this situation..
Pentaho help gives some idea / suggestion regarding the same