Search code examples
stringnodesknime

Remove part of a string in each row of a large column of data in KNIME


I am stumbed.

I have a column with some thousand rows of unique adresses regarding universities, pharmacompanies etc. in a KNIME workflow

Example: 55 Shattuck Street Boston Massachusetts 02115 US [NAT: US RES: US] for all designated states

What I need is to clean the data, so each row look like nice and computable like this: 55 Shattuck Street Boston Massachusetts 02115 US.

My problem Is I can't seem to get the system to remove everything after US. Does anyone know a suitable approach in KNIME?


Solution

  • You should be able to use either String Replacer or String Manipulation for this. The first one lets you use either a simple wildcard or a full regular expression pattern while the second one uses a Java-like syntax - the choice comes down to how many different variations on the input data you need to handle and which syntax you prefer.

    If you just need to remove any text between square brackets including the space before the open bracket then you can use String Replacer configured like this:

    enter image description here