Search code examples
regexexpressionapache-nifi

NiFi: Need to remove a character from an attribute while leaving all others, but having trouble with the expression


I have an attribute that contains URL encoded characters. I need to send it through a URL decode, but for some reason some of the encoded characters have an extra "%" symbol. I added an update attribute to try to fix it, but am having trouble with the expression.

Attribute: Name; value: name%c3%a1%s\<first> (NOTE: The length of the string before the "<" is variable.)

I need to change it to: name%c3%a1s\<first> such that the % after the a1 is removed. I have seen other letters directly before the "<", so I'm not sure how to remove the extra "%" but keep the letter. When I run ${Name:urlDecode()} with the attribute having value "name%c3%a1s\<first>" it works, but chokes when the value is "name%c3%a1%s\<first>".


Solution

  • You can use

    ${Name:replaceAll('%([a-zA-Z])\b', '$1')}
    

    Details:

    • % - a % char
    • ([a-zA-Z]) - Group 1: an ASCII letter (you may use \p{L} to match any letter)
    • \b - word boundary.

    The replacement is $1, the backreference to Group 1 value.