Search code examples
pythonlistnumpysublistnumpy-random

Split a list into n randomly sized chunks


I am trying to split a list into n sublists where the size of each sublist is random (with at least one entry; assume P>I). I used numpy.split function which works fine but does not satisfy my randomness condition. You may ask which distribution the randomness should follow. I think, it should not matter. I checked several posts which were not equivalent to my post as they were trying to split with almost equally sized chunks. If duplicate, let me know. Here is my approach:

import numpy as np

P = 10
I = 5
mylist = range(1, P + 1)
[list(x) for x in np.split(np.array(mylist), I)]

This approach collapses when P is not divisible by I. Further, it creates equal sized chunks, not probabilistically sized chunks. Another constraint: I do not want to use the package random but I am fine with numpy. Don't ask me why; I wish I had a logical response for it.

Based on the answer provided by the mad scientist, this is the code I tried:

P = 10
I = 5

data = np.arange(P) + 1
indices = np.arange(1, P)
np.random.shuffle(indices)
indices = indices[:I - 1]
result = np.split(data, indices)
result

Output:

[array([1, 2]),
 array([3, 4, 5, 6]),
 array([], dtype=int32),
 array([4, 5, 6, 7, 8, 9]),
 array([10])]

Solution

  • np.split is still the way to go. If you pass in a sequence of integers, split will treat them as cut points. Generating random cut points is easy. You can do something like

    P = 10
    I = 5
    
    data = np.arange(P) + 1
    indices = np.random.randint(P, size=I - 1)
    

    You want I - 1 cut points to get I chunks. The indices need to be sorted, and duplicates need to be removed. np.unique does both for you. You may end up with fewer than I chunks this way:

    result = np.split(data, indices)
    

    If you absolutely need to have I numbers, choose without resampling. That can be implemented for example via np.shuffle:

    indices = np.arange(1, P)
    np.random.shuffle(indices)
    indices = indices[:I - 1]
    indices.sort()