Search code examples
data-manipulationopenrefine

Combine column x to n in OpenRefine


I have a table with an unknown number of columns, and I need to combine all columns after a certain point. Consider the following:

| A  | B  | C | D | E |
|----|----|---|---|---|
| 24 | 25 | 7 |   |   |
| 12 | 3  | 4 |   |   |
| 5  | 5  | 5 | 5 |   |

Columns A-C are known, and the information in them correct. But column D to N (an unknown number of columns starting with D) needs to be combined as they are all parts of the same string. How can I combine an unknown number of columns in OpenRefine?

As some columns may have empty cells (the string may be of various lengths) I also need to disregard empty cells.


Solution

  • There is a two step approach to this that should work for you.

    From the first column you want to merge (Col D in this case) choose Transpose->Transpose cells across columns into rows

    You will be asked to set some options. You'll want to choose 'From Column' D and 'To Column' N. Then choose to transpose into One Column, assign a name to that column, make sure the option to 'Ignore Blank Cells' is checked (should be checked by default. Then click Transpose.

    You'll get the values that were previously in cols D-N appearing in rows. e.g.

    | A  | B  | C | D | E | F |
    |----|----|---|---|---|---|
    | 1  | 2  | 3 | 4 | 5 | 6 |
    

    Transposes to:

    | A  | B  | C | new |
    |----|----|---|-----|
    | 1  | 2  | 3 | 4   |
    |    |    |   | 5   |
    |    |    |   | 6   |
    

    You can then use the dropdown menu from the head of the 'new' column to choose Edit cells->Join multi-value cells

    You'll be asked what character you want to use to separate the characters in the joined cell. Probably in your use case you can delete the joining character and combine the cells without any joining characters. This will give you:

    | A  | B  | C | new |
    |----|----|---|-----|
    | 1  | 2  | 3 | 456 |