Search code examples
openrefinegrel

Extract text using GREL in OpenRefine


I'm trying to add a column based on a column in OpenRefine using GREL.

I need to extract every text after the second space in scientific name.

Here is two examples of the original cell data ---> what I want to extract:

Amandinea punctata (Hoffm.) Coppins & Scheid. ---> (Hoffm.) Coppins & Scheid. Agonimia tristicula (Nyl.) Zahlbr. ---> (Nyl.) Zahlbr.


Solution

  • Here are three ways to achieve the desired result on the given data, ordered from easy to understand to more advanced.

    Use column splitting

    You can split the column into three columns by choosing a whitespace as separator and limit the number of new columns to 3 in the corresponding dialog. Then you can delete the first two columns and have your desired result.

    Use Array functions

    You can use the same technique via GREL and arrays... split on whitespace, discard the first two entries and join the rest on whitespace.

    value.split(" ").slice(2).join(" ")
    

    Use regular expressions

    You can also use the match function with a regular expression.

    value.match(/\S+\s\S+\s(.+)/)[0]