Search code examples
regextalend

Talend - tFileInputRegex


My goal is to parse several text files using a RegEx, and based on match to copy file renaming it (including regex match string in the file name itself):

Talend project overview using tFileInputRegex: Talend project overview

Regex should find all rows matching "Invoice - xxxxx" or "Num.Ord - yyyyy".

So I can have files like this (Invoice - 10044165 RI):

     Company XXX, LLC                                          Page Number-            1
 P.O. Box 26610                                        I N V O I C E                      Date       -     02/15/05
 Miami, MI  64196                                                                         Customer   -        20035
                                                                          Lot Potency.     50006427
                                                                                          Brn/Plt    -    100780000
                                              REMIT TO:                                   Order Nbr  -    242242 SO
                                                            .                             Invoice    -  10044165 RI

Or like this (Num.Ord - 50006427):

     Company XXX, LLC                                          Page Number-            1
 P.O. Box 26610                                        I N V O I C E                      Date       -     02/15/05
 Miami, MI  64196                                                                         Customer   -        20035
                                                                          Num.Ord    -     50006427
                                                                                          Brn/Plt    -    100780000
                                              REMIT TO:                                  
                                                            .                         
                                              126 Ctest
                                              Chicago, IL

I'm trying to figure it out how to have a working OR Regex searching for rows containing "Num.Ord" OR "Invoice". I've tested online with a regex parser and this one works:

[\n\r].*(Invoice|Num.Ord)\s*-\s*([^\n\r]*)

When I try to import in Talend component tFileInputRegex, using appropriate notation, it does not work (no match on OR "Num.Ord"):

"[\\n\\r].*(Invoice|Num.Ord)\\s*-\\s*([^\\n\\r]*)"

Solution

  • Finally I've sorted out by parsing again with another tFileInputRegex Talend component with files rejected by first one:

    Talend job schema: Talend job schema