Search code examples
pythonpandasdataframesframe

Efficient splitting of data in Python


Consider following code

one, two = sales.random_split(0.5, seed=0)
set_1, set_2 = one.random_split(0.5, seed=0)
set_3, set_4 = two.random_split(0.5, seed=0)

What I am trying to in this code is to randomly split my data in Sales Sframe (which is similar to Pandas DataFrame) into roughly 4 equal parts.

What is a Pythonic/Efficient way to achieve this?


Solution

  • np.random.seed(0)
    np.random.shuffle(arr) # in-place
    sets = np.array_split(arr, 4)