Search code examples
regexstringregex-groupregex-greedyknime

RegEx for adding a comma and space in between first/last names


I have a list of names where the last and first names appear together:

BorisovaSvetlana A.; KimHak Joong; PuXiaotao; LiuHung-wen*

I would like to add a comma and space between last and first names, for the output to be:

Borisova, Svetlana A.; Kim, Hak Joong; Pu, Xiaotao; Liu, Hung-wen*

I am using a String Manipulation node in KNIME and I think regexReplace($col1$, ,"") would be used and perhaps some kind of lookahead using [a-z] and [A-Z] to look for instances of a lowercase directly letter followed a capital letter, but I am new to regex so that's all I have so far.

How do I solve this problem?


Solution

  • This RegEx might help you to design a proper expression to match all your inputs:

    ([A-Z]{1}[a-z-]{1,})([A-Z]{1}[a-z-]{1,})
    
    • It has two capturing groups one for first and the other for last names.
    • It does not match Latin chars, if you wish so, you might change a-z to \w.
    • You can simply create a string replace to replace $1$2 with $1, $2.
    • You can also add additional boundaries to the expression, if necessary.

    It means that,

    • One capital letter followed by one or more lowercase letters and dashes for the first names, and the same for last names, and you can simply change these boundaries inside the two groups as you wish.

    enter image description here


    Edit:

    Based on Pushpesh's advice, it can be much simplified to this expression:

    ([A-Z][a-z-]+)([A-Z][a-z-]+)
    

    enter image description here