I am using TALEND DATA INTEGRATION
I have a log file like this
I - Fab - 392 - 2014/12/20 22:09:15:200 - XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Begin :
I - Fab - 392 - 2014/12/20 22:12:15:438 - XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Bus / Before :
500|00104|002PL|0036364043 |005PL
809|001BBG|00365 |005-0200|006+0000|007000|0080000|0240|0250|0260|0270|0280|0290|033STK|034063100 |0441
830|0093100 |0441
I - Fab - 392 - 2014/12/20 22:12:19:766 - XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Bus / After :
500|00104|002PL|0036364043 |005PL
510|001BBG|00365 |005-0200|006+0000|007000|0080000|0240|0250|0260|0270|0280|0290|033STK|034063100 |0441
I want to extract the lines 2&3 and 6&7 (it's not always pair and impair). Anyway, I used a regular expression :
"I - (Fab|Opt) - \\d+ - (\\d{4}/\\d{2}/\\d{2}) (\\d{2}:\\d{2}:\\d{2}:\\d{3}) - .+ Bus / (.+) : \\n500|.+|003(\\d{7}).+"
using a tFileInputRegex
, however I don't know what to use in the row separator (by default "\n"
)
I want my output to be a CSV file in which there are data extracted from the first and second lines.
I used a tMap to generate a CSV file, but the problem is I cannot extract the data I want.
If I extract the data I want I will be able to generate the file. So, I need help in the regex part. I wonder if there's a way in Talend DI to extract multiple rows (in my case TWO) using tFileInputRegex
.
EDIT :
I have specified I -
as a row separator, so I can be able to use \n
(without any confusion), but the regex doesn't seem functional.
The \n
delimiter for multiline (rows) should work, so it's more an issue of your overall regex. Try using a pattern such as this, for it should capture the groups correctly:
I.+(Fab|Opt).+(\\d{4}\\/\\d{2}\\/\\d{2}).+(\\d{2}:\\d{2}:\\d{2}:\\d{3}).+Bus\\s\\/\\s(\\w+)\\s:\\W+\\n500.+003(\\d{7}).+
Example: