I have trouble matching multiple groups, some of which are optional. I've tried variations of greedy/non greedy, but can't get it to work.
As input, I have cells which look like this:
SEPA Overboeking IBAN: AB1234 BIC: LALA678 Naam: John Smith Omschrijving: Hello hello Kenmerk: 03-05-2019 23:12 533238
I wanna split these up into groups of IBAN, BIC, Naam, Omschrijving, Kenmerk.
For this example, this yields: AB1234; LALA678; John Smith; Hello hello; 03-05-2019 23:12 533238. To obtain this, I've used:
.*IBAN: (.*)\s+BIC: (.*)\s+Naam: (.*)\s+Omschrijving: (.*)\s+Kenmerk: (.*)
This works perfectly as long as all these groups are present in the input. Some cells, however don't have the "Omschrijving" and/or "Kenmerk" part. As output, I would like to have empty groups if they're not present. Right now, nothing is matched. I've tried variations with greedy/non greedy, but couldn't get it to work.
Help would be greatly appreciated!
N.B.: I'm working in KNIME (open source data analysis tool)
I was able to split your input using the following regular expression:
^.*
\s+IBAN\:\s*(?<IBAN>.*?)
\s+BIC\:\s*(?<BIC>.*?)
\s+Naam\:\s*(?<Naam>.*?)
(?:\s+Omschrijving\:\s*(?<Omschrijving>.*?))?
(?:\s+Kenmerk\:\s*(?<Kenmerk>.*?))?
$
This requires your fields to follow the given order and will treat the fields IBAN
, BIC
and Naam
as required. Fields Omschrijving
and Kenmerk
may be optional. I am pretty sure, this can still be optimized, but it results in the following output, which should be fine for you (or at least a starting point):
For evaluation and testing in KNIME, I used Palladian's Regex Extractor node, that can be configured as follows and provides a nice preview functionality:
I added an example workflow to my NodePit Space. It contains some example lines, parses them and provides the above seen output.