Search code examples
arraysnumpyindexinginclude

Complex NumPy Array Manipulation


I have two numpy arrays: e.g.

np.array_1([
[5,2,0]
[4,3,0]
[4,2,0]
[3,2,1]
[4,1,1]
])

np.array_2([
[5,2,10]
[4,2,52]
[3,2,80]
[1,2,4]
[5,3,6]
])

In np.array_1, 0 and 1 at index 2 represent two different categories. For arguments sake say 0 = Red and 1 = Blue.

So, where the first two elements match in the two numpy arrays, I need to average the third element in np.array_2 by category. For example, [5,2,10] and [4,2,52] both match with category 0 i.e. Red. The code will return the average of the elements at index 2 for the Red category. It will also do the same for the Blue category.

I have no idea where to start with this, any ideas welcome.


Solution

  • You marked your post with Numpy tag due to the type of source arrays, but it is much easier and intuitive to generate your result using Pandas.

    Start from conversion of your both arrays to pandasonic DataFrames. While converting the first array, convert also 0 and 1 in the last column to Red and Blue:

    import pandas as pd
    
    df1 = pd.DataFrame(array_1, columns=['A', 'B', 'key'])
    df1.key.replace({0: 'Red', 1: 'Blue'}, inplace=True)
    df2 = pd.DataFrame(array_2, columns=['A', 'B', 'C'])
    

    Then, to generate the result, run:

    result = df2.merge(df1, on=['A', 'B']).groupby('key').C.mean().rename('Mean')
    

    The result is:

    key
    Blue    80
    Red     31
    Name: Mean, dtype: int32
    

    Details:

    1. df2.merge(df1, on=['A', 'B']) - Generates:

         A  B   C   key
      0  5  2  10   Red
      1  4  2  52   Red
      2  3  2  80  Blue
      

      eliminating at the same time rows which don't belong to any group (are neither Red nor Blue).

    2. groupby('key') - From the above result, generates groups by key (Red / Blue).

    3. C.mean() - the last step is to take C column (from each group) and compute its mean.

    4. The result is a Series with:

      • index - the grouping key,
      • value - the value computed for the corresponding group.
    5. rename('Mean') - Change the name from the source column name (C) to a more meaningful Mean.