Search code examples
pythonarraysnumpygrouping

How to group consecutive data in 2d array in python


I have a 2d NumPy array that looks like this:

array([[1, 1],
       [1, 2],
       [2, 1],
       [2, 2],
       [3, 1],
       [5, 1],
       [5, 2]])

and I want to group it and have an output that looks something like this:

         Col1 Col2
group 1: 1-2, 1-2
group 2: 3-3, 1-1
group 3: 5-5, 1-2

I want to group the columns based on if they are consecutive.

So, for a unique value In column 1, group data in the second column if they are consecutive between rows. Now for a unique grouping of column 2, group column 1 if it is consecutive between rows.

The result can be thought of as corner points of a grid. In the above example, group 1 is a square grid, group 2 is a a point, and group 3 is a flat line.

My system won't allow me to use pandas so I cannot use group_by in that library but I can use other standard libraries.

Any help is appreciated. Thank you


Solution

  • Here you go ...

    Steps are:

    • Get a list xUnique of unique column 1 values with sort order preserved.
    • Build a list xRanges of items of the form [col1_value, [col2_min, col2_max]] holding the column 2 ranges for each column 1 value.
    • Build a list xGroups of items of the form [[col1_min, col1_max], [col2_min, col2_max]] where the [col1_min, col1_max] part is created by merging the col1_value part of consecutive items in xRanges if they differ by 1 and have identical [col2_min, col2_max] value ranges for column 2.
    • Turn the ranges in each item of xGroups into strings and print with the required row and column headings.
    • Also package and print as a numpy.array to match the form of the input.
    import numpy as np
    data = np.array([
        [1, 1],
        [1, 2],    
        [2, 1],    
        [2, 2],
        [3, 1],
        [5, 1],
        [5, 2]])
    xUnique = list({pair[0] for pair in data})
    xRanges = list(zip(xUnique, [[0, 0] for _ in range(len(xUnique))]))
    rows, cols = data.shape
    iRange = -1
    for i in range(rows):
        if i == 0 or data[i, 0] > data[i - 1, 0]:
            iRange += 1
            xRanges[iRange][1][0] = data[i, 1]
        xRanges[iRange][1][1] = data[i, 1]
    xGroups = []
    for i in range(len(xRanges)):
        if i and xRanges[i][0] - xRanges[i - 1][0] == 1 and xRanges[i][1] == xRanges[i - 1][1]:
            xGroups[-1][0][1] = xRanges[i][0]
        else:
            xGroups += [[[xRanges[i][0], xRanges[i][0]], xRanges[i][1]]]
    
    xGroupStrs = [ [f'{a}-{b}' for a, b in row] for row in xGroups]
    
    groupArray = np.array(xGroupStrs)
    print(groupArray)
    
    print()
    print(f'{"":<10}{"Col1":<8}{"Col2":<8}')
    [print(f'{"group " + str(i) + ":":<10}{col1:<8}{col2:<8}') for i, (col1, col2) in enumerate(xGroupStrs)]
    

    Output:

    [['1-2' '1-2']
     ['3-3' '1-1']
     ['5-5' '1-2']]
    
              Col1    Col2
    group 0:  1-2     1-2
    group 1:  3-3     1-1
    group 2:  5-5     1-2