Search code examples
talend

How to apply a row limit based on percentage filtration in Talend?


I am using Talend's tDBInput component to read data from a SQL table, followed by a tMap component to filter the rows. The number of rows that pass through the tMap varies depending on the data in the SQL table.

Scenario:

  • If the percentage of filtered rows exceeds 80%, I want the flow to continue to the tDBOutput component.
  • If the percentage of filtered rows is 80% or less, the process should stop or redirect to a tLogRow component.

Example:

  • If my table has 1000 rows and the tMap filters it down to 600 rows (60%), the process should stop or redirect to tLogRow.
  • If my table has 100 rows and the tMap filters it down to 95 rows (95%), it should proceed to tDBOutput.

What I Have Tried:

I learned about the global variable NB_LINE, which can be used to get the number of rows written to the component's file or table. However, I am unsure how to implement this to achieve the desired flow for my scenario.

Questions:

  1. How can I use the NB_LINE variable to determine the percentage of filtered rows?
  2. What components or logic should I use to implement the conditional flow based on the percentage of rows?

Any guidance on how to set up this flow in Talend would be greatly appreciated!


Solution

  • global variable is the way to go I think, but this variable is only accessible on "AFTER" mode, meaning that you have to exit subjob to get the number of line processed.

    useful components in this case are the pair tHashInput/tHashOutput (hidden by default but you can activate them through the "palette" menu in project settings). With them you are able to use a cache for your data, and separate in multiple subjobs.

    • one subjob to filter data
    • then with IF links supporting your threshold condition, you can route to desired output.

    enter image description here