How to divide a collection into subgroups in functional programming way?

I'm so confused that how to divide a collection into subgroups in a functional programming manner since this operation depends on other elements in the same collection.

One of these problem is to divide a video clip into several GOP groups.

A video clip can be viewed as a list with different types of frames in it, eg. [IPBBBPBBBIPBIPBPBP], and I, P, B is type of the frame.

A GOP is a sub clip that start with an I frame (though in real world it must be an IDR frame, I simplify the question here a little bit)

So the problem is how to achieve this in a FP way: [IPBBBPBBBIPBIPBPBP] -> [[IPBBBPBBB],[IPB],[IPBPBP]] ?

I have read some FP books and I am so confused how can I use just map, filter and reduce to achieve this.

In Imperative Programming language such as Python I could easily write something like this:

gops = []
for frame in video:
    if is_i_frame(frame) is True:
        gops.append([frame])
    else:
        gops[:-1].append(frame)

But in Functional Programming world I can't figure out how to do this or how to represent this problem in Lambda Calculus expression.

So I hope someone can show me how to do this in a FP way. Is it possible to do such FP operation in Python ? And how will the Lambda Calculus expression looks like ?

Solution

Using a combinations of itertools and functools you can get to a fully functional solution. (Though, way longer, and, in my opinion less readable, than the pythonic version you mentioned in your question.)

To break down the steps I am using, first, I am creating an index for which sub-video, each frame belongs to:

import itertools as it
import functools as ft

list(it.accumulate(video, lambda total, frame: total + (frame=="I"), initial=0))
>>> [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3]

Them, I have to remove the extra 0 at the start, so I use itertools.islice:

list(it.islice(
    it.accumulate(video, lambda total, frame: total + (frame=="I"), initial=0), 1, None
))
>>> [1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3]

Then, I will zip the sublist index, with the original video array:

zip(
    it.islice(
        it.accumulate(video, lambda total, frame: total + (frame=="I"), initial=0), 1, None
    ),
    video
),

Now, the meat of the operation: I will use itertools.groupby, to group the elements of video by the index we just computed:

it.groupby(
    zip(
        it.islice(
            it.accumulate(video, lambda total, frame: total + (frame=="I"), initial=0), 1, None
        ),
        video
    ),
    key=lambda x: x[0]
)

The group by will return an object that looks like [[(1,I), (1,P),...], [(2,I),...], ...], so we want to reduce that to proper strings, so we can use functools.reduce, and map it over all the groups:

list(map(
    lambda k_g: ft.reduce(lambda total, x: total+x[1], k_g[1], ""),
    it.groupby(
        zip(
            it.islice(
                it.accumulate(video, lambda total, frame: total + (frame=="I"), initial=0), 1, None
            ),
            video
        ),
        key=lambda x: x[0]
    )
))
>>> ['IPBBBPBBB', 'IPB', 'IPBPBP']

Notice that, I used (frame=="I"), but you can very well replace this with more advanced logic is_i_frame for instance, as long as it returns a boolean.