Search code examples
data-cleaningopenrefine

OpenRefine - replacing string value in one column based no the value in another


I have a large CSV which contains information about how a collection is divided up. For example one column contains information about the top level category, another about the sub-category and there can be quite a few of these depending on sub-classifications.

In OpenRefine these look like(for example):

||field 1        || field 2                           || field 3
||I am a section || I am a section with a subsection  ||  I am a section with a subsection with another subsection

In order to be able to correctly split these out into top level and subsections I thought perhaps I could use the replace function to remove the value of field1 from the value of field 2 and onwards. This would leave me with

||field 1        || field 2           || field 3
||I am a section || with a subsection ||  with another subsection

My questions are:

  1. Is this the right approach or is there something more elegant?
  2. Is it is, how do I use the replace function to dynamically do this in the entire CSV?

Solution

  • You can reference another column with the following GREL expression: cells['field 2'].value

    For example, in your case, you will be replacing the value from field 2 by nothing (using ''), the expression is: value.replace(cells['field 2'].value,'')