Search code examples
cluster-analysisdata-cleaningopenrefine

clustering word in sentences in openrefine


I'd like to cluster words in a text file with rows like this:

number queries waiting support representatives become available
query numbers 

More specifically, I want to replace words with their cluster representatives without changing the sentences otherwise.

What I'm trying to do is: 1. split my column at spaces into more columns, each with a 1 word/row 2. cluster all columns 3. merge the columns back

But this is very tedious. I'd like to hear about an easier and perhaps more elegant solution.


Solution

  • A probably better solution is to create a record for each row, "split multivalued cells" by space, cluster, and then join by space.

    Example :

    enter image description here