I have two numpy arrays: e.g.
np.array_1([
[5,2,0]
[4,3,0]
[4,2,0]
[3,2,1]
[4,1,1]
])
np.array_2([
[5,2,10]
[4,2,52]
[3,2,80]
[1,2,4]
[5,3,6]
])
In np.array_1, 0 and 1 at index 2 represent two different categories. For arguments sake say 0 = Red and 1 = Blue.
So, where the first two elements match in the two numpy arrays, I need to average the third element in np.array_2 by category. For example, [5,2,10] and [4,2,52] both match with category 0 i.e. Red. The code will return the average of the elements at index 2 for the Red category. It will also do the same for the Blue category.
I have no idea where to start with this, any ideas welcome.
You marked your post with Numpy tag due to the type of source arrays, but it is much easier and intuitive to generate your result using Pandas.
Start from conversion of your both arrays to pandasonic DataFrames. While converting the first array, convert also 0 and 1 in the last column to Red and Blue:
import pandas as pd
df1 = pd.DataFrame(array_1, columns=['A', 'B', 'key'])
df1.key.replace({0: 'Red', 1: 'Blue'}, inplace=True)
df2 = pd.DataFrame(array_2, columns=['A', 'B', 'C'])
Then, to generate the result, run:
result = df2.merge(df1, on=['A', 'B']).groupby('key').C.mean().rename('Mean')
The result is:
key
Blue 80
Red 31
Name: Mean, dtype: int32
Details:
df2.merge(df1, on=['A', 'B'])
- Generates:
A B C key
0 5 2 10 Red
1 4 2 52 Red
2 3 2 80 Blue
eliminating at the same time rows which don't belong to any group (are neither Red nor Blue).
groupby('key')
- From the above result, generates groups by key
(Red / Blue).
C.mean()
- the last step is to take C column (from each group)
and compute its mean.
The result is a Series with:
rename('Mean')
- Change the name from the source column name (C)
to a more meaningful Mean.