Search code examples
randomdata-cleaningopenrefine

How to make a random sample in Openrefine?


Very often we need to extract random samples of a large dataset? What is the best way to do it on openrefine? This might be useful for practitioners used to do it in R and Python.

Thanks in advance for any advice!


Solution

  • Open Refine has not built-in function for that, but you can use Python/Jython to create a new column of random integers. eg, if you have 100 000 rows :

    import random
    return random.randint(0, 100000)
    

    Then, you can sort this columns, reorder rows permanently and select for example the first thousand with a custom text facet :

    row.index < 1000
    

    EDIT : I forgot that this extension from @OwenStephens adds a randomNumber GREL function. Feel free to install it.

    enter image description here