Search code examples
pythonlistslice

Slicy every nth element, not beginnint with the first


How can I elegantly separate a python list into two, so that the second one has every nth element of the first one, and these sliced elements are removed from the first list? The slicing should not begin with the first element!

Example:

split_data([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])

should return ([1,2,3,4,6,7,8,9,11,12,13,14], [5,10,15])

Thank you :)

Edit: For the part of selecting every nth element, I tried the following:

test = data[::5]
train = data
del data[::5]
return (train, test)

this however would only return ([2, 3, 4, 5, 7, 8, 9, 10, 12, ...], [1, 6, 11, 16, 21, 26]) for split_data(list(range(1, 30))). With elegant I wanted to express that I wanted to avoid using a for-loop to iterate over the list ;)


Solution

  • You can take advantage of list.pop() which removes an element by index and also returns it. So your original list will not contain those numbers and by creating a new list with the popped items you can have your second list.

    def split(l,n):
        return (l, [l.pop(i) for i in range(n, len(l), n)])
    
    >>>l = list(range(1,16))
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
    
    >>>split(l,4)
    ([1, 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 14], [5, 10, 15])
    
    

    This will mutate the list passed as argument. If you want your function to leave it as is, simply add l=list(l) before messing with l.

    Thanks to @crazyvalues this answer is not valid as list.pop() decreases the length of the list while iterating, eventually resulting in indices being out of bounds.

    However, using numpy or pandas one can still solve the same problem "elegantly":

    • numpy:
    def slice_numpy(l, n):
        l = np.array(l) 
        mask = list(range(n-1,len(l),n))
        return np.delete(l, mask), l[mask]
        # note that `len` and `delete` will work as expected on 1D arrays. if you have a 2D dataset you need to modify them accordingly
    
    • pandas
    def slice_pandas(l, n):
        l = pd.Series(l)
        mask = list(range(n-1,len(l),n))
        return l.drop(mask), l[mask]
    
    
    • Example:
    >>> l = list(range(1, 16))
    >>> l
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
    >>>
    >>>
    >>> train, test = slice_numpy(l, 5)
    >>> train
    array([ 1,  2,  3,  4,  6,  7,  8,  9, 11, 12, 13, 14])
    >>> test
    array([ 5, 10, 15])
    >>>
    >>>
    >>> train, test = slice_pandas(l, 5)
    >>> train
    0      1
    1      2
    2      3
    3      4
    5      6
    6      7
    7      8
    8      9
    10    11
    11    12
    12    13
    13    14
    dtype: int64
    >>> test
    4      5
    9     10
    14    15
    dtype: int64