I am trying to filter out records from an excel sheet with over 94000 records with a simple validation and getting them in a new Excel sheet using Pentaho . But the speed of reading the input file and filtering of the record reduces gradually to less than 50 r/s after passing 20000 records.
Is there a way to increase the speed of processing the records or maintain the initial speed which was 1000 r/s ?
I think the best way to solve this is split the current transformation in 2 transformations and 1 job. The first transformation will Input the Excel Rows and filter, then use a Text File Output step as a "temporary" result. In the next transformation, read the CSV file created previously, and export it to excel with Excel Writer. Having the rows in plain-text CSV is faster than reading excel. Excel wirting is extremly limited in spoon. Have fun.