Search code examples
datastage

How to capture the last record in a file


I have a requirement to split a sequential file into 3 parts, Header, Data, Trailer. I have the header and Data worked out.

Is there a way, in a Transformer, to determine if you have the last record in a sequential file? I tried using LastRow() but that gives me the last row for each node. I need to leave parallelize on.

Thanks in advance for any help.


Solution

  • You have no a priori knowledge about which node the trailer row will come through on. There is therefore no solution in a Transformer stage if you want to retain parallel execution.

    One way to do it is to have a reject link on the Sequential File stage. This will capture any row that does not match the defined metadata. Set up the stage with the metadata for your Data rows, then the Header and Trailer will be captured onto the reject link. It should be pretty obvious from their data which is which, and you can process them further and perhaps even rejoin them to your Data rows.

    You could also capture the last row separately (e.g. via head -1 filename) and compare that against every row processed to determine if it's the last. Computationally heavy for very little gain.