Search code examples
talend

Store row numbers which are causing "error"


enter image description here

I have to retrieve certain information from urls. For this I have to enter text into fields of the url. I am using GET operation for this. I have to modify the text to replace spaces with "%20". Some times the text(which is taken from the database) is badly formed. I would like to know the row numbers so I can manually change the text for such rows in the database and run it again. I have tried to use the logs and errors section but with little luck. Does anybody have an idea of how to do this?


Solution

  • First shot: Output bad urls on the console

    So far, I came up with the following job design for your problem: Job design and tJavaFlex component view

    The trick is to catch the exceptions of the tHttpRequest component and print the necessary details on the console. For this example, I included the line number, the exception message and the URL that produced the exception.

    Output (I couldn't reproduce your "Illegal character error", so I took a different one): Job output

    Second shot: Output to a file

    If you really need to output the line numbers to a file, things get a little more complicated.

    Job version 2

    Instead of printing the info straight onto the console, we collect all line numbers into a context variable of type (Java) List inside the tJavaFlex. After the usual URL processing (which I have left out from the job design to keep the example small), we iterate over the Java List and save it into a tHashOutput, so that we can finally write to a file.

    We cannot directly write to the file in the tLoop section, since the Iterate flow would lead to the situation the the tFileInputDelimited would be opened several times. If "Append" was disabled, only the last bad URL line number would finally appear in the output file. If "Append" was enabled, you would get the full list of line numbers after the very first job run - but you would append every time you run the job, making the list longer and longer. Workarounds would be to use a runtime-dependent file name (e.g. timestamp) or to delete the file at the beginning of the job run. I chose the third option, that overwrites the file every time we run the job. Feel free to chose among those options the one which suits your use case best.

    Details

    The tHashOutput/tHashInput components are not visible on default, but must be enabled first to show up: https://www.talendforge.org/forum/viewtopic.php?pid=107249#p107249

    Context variable: Context variables

    INIT: INIT

    tJavaFlex "catch errors", end code: tJavaFlex

    tLoop:

    tLoop

    tFixedFlowInput "badURL":

    badURL

    tHashOutput:

    Needs to have "Append" enabled.