Regex to remove duplicate numbers from a string

I have produced a data set with codes separated by pipe symbols. I realized there are many duplicates in each row. Here are three example rows (the regex is applied to each row individually in KNIME)

0612|0613|061|0612|0612
0211|0612|021|0212|0211|0211
0111|0111
0511|0512|0511|0511|0521|0512|0511

I am trying to build a regex that removes the duplicate code numbers from each row. I tested \b(\d+)\b.*\b\1\b from a different thread here but the expression does not keep the other codes. The desired outputs for the example rows above would be

0612|0613|061
0211|0612|021|0212
0111
0511|0512|0521

Appreciate your help

Solution

No idea what regex engine this knime uses.

Probably you need one that supports variable length lookbehind to do it in one pass, eg .NET

\|(\d+)\b(?<=\b\1\b.*?\1)

See .NET regex demo at Regexstorm (check [•] replace matches with, click on "context")

Update: Turns out knime uses Java's pattern implementation...

In Java regex variable-width lookbehind is actually implemented, but only by use of finite repitition. The second issue is, that backreference \1 can't be used inside a lookbehind. So we'd need some trickery and put it into a lookahead which we put in the lookbehind.

Let's assume a maximum potential distance of 999 characters between duplicates and each field can contain up to 9 digits (adjust these values to your needs).

\|(\d+)\b(?<=\b(?=\|?\1\b).{1,999}?\|\d{1,9})

Java regex demo at Regex101 (explanation on right side)

0612|0613|061
0211|0612|021|0212
0111
0511|0512|0521

With only a lookahead you can get unique rows too, but vice versa (not like your desired results)

\b(\d+)\|(?=.*?\b\1\b)

Another demo on Regex101

0613|061|0612
0612|021|0212|0211
0111
0521|0512|0511

For further information have a look into the Stackoverflow Regex FAQ.