Search code examples
pythonfunctional-programming

How to divide a collection into subgroups in functional programming way?


I'm so confused that how to divide a collection into subgroups in a functional programming manner since this operation depends on other elements in the same collection.

One of these problem is to divide a video clip into several GOP groups.

A video clip can be viewed as a list with different types of frames in it, eg. [IPBBBPBBBIPBIPBPBP], and I, P, B is type of the frame.

A GOP is a sub clip that start with an I frame (though in real world it must be an IDR frame, I simplify the question here a little bit)

So the problem is how to achieve this in a FP way: [IPBBBPBBBIPBIPBPBP] -> [[IPBBBPBBB],[IPB],[IPBPBP]] ?

I have read some FP books and I am so confused how can I use just map, filter and reduce to achieve this.

In Imperative Programming language such as Python I could easily write something like this:

gops = []
for frame in video:
    if is_i_frame(frame) is True:
        gops.append([frame])
    else:
        gops[:-1].append(frame)

But in Functional Programming world I can't figure out how to do this or how to represent this problem in Lambda Calculus expression.

So I hope someone can show me how to do this in a FP way. Is it possible to do such FP operation in Python ? And how will the Lambda Calculus expression looks like ?


Solution

  • Using a combinations of itertools and functools you can get to a fully functional solution. (Though, way longer, and, in my opinion less readable, than the pythonic version you mentioned in your question.)

    To break down the steps I am using, first, I am creating an index for which sub-video, each frame belongs to:

    import itertools as it
    import functools as ft
    
    list(it.accumulate(video, lambda total, frame: total + (frame=="I"), initial=0))
    >>> [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3]
    

    Them, I have to remove the extra 0 at the start, so I use itertools.islice:

    list(it.islice(
        it.accumulate(video, lambda total, frame: total + (frame=="I"), initial=0), 1, None
    ))
    >>> [1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3]
    

    Then, I will zip the sublist index, with the original video array:

    zip(
        it.islice(
            it.accumulate(video, lambda total, frame: total + (frame=="I"), initial=0), 1, None
        ),
        video
    ),
    

    Now, the meat of the operation: I will use itertools.groupby, to group the elements of video by the index we just computed:

    it.groupby(
        zip(
            it.islice(
                it.accumulate(video, lambda total, frame: total + (frame=="I"), initial=0), 1, None
            ),
            video
        ),
        key=lambda x: x[0]
    )
    

    The group by will return an object that looks like [[(1,I), (1,P),...], [(2,I),...], ...], so we want to reduce that to proper strings, so we can use functools.reduce, and map it over all the groups:

    list(map(
        lambda k_g: ft.reduce(lambda total, x: total+x[1], k_g[1], ""),
        it.groupby(
            zip(
                it.islice(
                    it.accumulate(video, lambda total, frame: total + (frame=="I"), initial=0), 1, None
                ),
                video
            ),
            key=lambda x: x[0]
        )
    ))
    >>> ['IPBBBPBBB', 'IPB', 'IPBPBP']
    

    Notice that, I used (frame=="I"), but you can very well replace this with more advanced logic is_i_frame for instance, as long as it returns a boolean.