My goal is to parse several text files using a RegEx, and based on match to copy file renaming it (including regex match string in the file name itself):
Talend project overview using tFileInputRegex:
Regex should find all rows matching "Invoice - xxxxx" or "Num.Ord - yyyyy".
So I can have files like this (Invoice - 10044165 RI):
Company XXX, LLC Page Number- 1
P.O. Box 26610 I N V O I C E Date - 02/15/05
Miami, MI 64196 Customer - 20035
Lot Potency. 50006427
Brn/Plt - 100780000
REMIT TO: Order Nbr - 242242 SO
. Invoice - 10044165 RI
Or like this (Num.Ord - 50006427):
Company XXX, LLC Page Number- 1
P.O. Box 26610 I N V O I C E Date - 02/15/05
Miami, MI 64196 Customer - 20035
Num.Ord - 50006427
Brn/Plt - 100780000
REMIT TO:
.
126 Ctest
Chicago, IL
I'm trying to figure it out how to have a working OR Regex searching for rows containing "Num.Ord" OR "Invoice". I've tested online with a regex parser and this one works:
[\n\r].*(Invoice|Num.Ord)\s*-\s*([^\n\r]*)
When I try to import in Talend component tFileInputRegex
, using appropriate notation, it does not work (no match on OR "Num.Ord"):
"[\\n\\r].*(Invoice|Num.Ord)\\s*-\\s*([^\\n\\r]*)"
Finally I've sorted out by parsing again with another tFileInputRegex Talend component with files rejected by first one:
Talend job schema: