Search code examples
pythonlistparsinglist-comprehensionlist-manipulation

chunking list by delimiter in Python


What is the current way to chunk a list of the following form: ["record_a:", "x"*N, "record_b:", "y"*M, ...], i.e. a list where the start of each record is denoted by a string ending in ":", and includes all the elements up until the next record. So the following list:

["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]

would be split into:

[["record_a", "a", "b"], ["record_b", "1", "2", "3", "4"]]

The list contains an arbitrary number of records, and each record contains an arbitrary number of list items (up until when the next records begins or when there are no more records.) how can this be done efficiently?


Solution

  • Use a generator:

    def chunkRecords(records):
        record = []
        for r in records:
            if r[-1] == ':':
                if record:
                    yield record
                record = [r[:-1]]
            else:
                record.append(r)
        if record:
            yield record 
    

    Then loop over that:

    for record in chunkRecords(records):
        # record is a list
    

    or turn in into a list again:

    records = list(chunkRecords(records))
    

    The latter results in:

    >>> records = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]
    >>> records = list(chunkRecords(records))
    >>> records
    [['record_a', 'a', 'b'], ['record_b', '1', '2', '3', '4']]