I am trying to run an Median IF NOT EQUAL to a column reference. I can group by and run a median. However, I couldn't able to run a Median group by not equal to referring to a value/character in columns.
import pandas as pd
# intialise data of lists.
data={'id':[ 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
'var1':['var1','var1','var1','var1','var1','var1','var1','var1','var1','var1','var2','var2','var2','var2','var2'],
'var2':[ 'A','A','A','B','B','B','C','C','C','A','A','A','B','B','C'],
'var3':[ 'A','A','A','A','A','A','A','A','A','A','A','A','A','A','A'],
'values':[ 10,870,1731,80,110,3848,3590,344,30,60,60,190,440,780,1460]}
#Create DataFrame
df=pd.DataFrame(data)
Excel Formula:-
=MEDIAN(IF($B:$B=H2,IF($C:$C<>$I2,$E:$E)))
Column reference:
B - var1 (input), H - grouped var1 (below output var1), C - var2(Input), I - (below output var2), E - values in input file.
Desired Output -
var1,var2,median
var1,A,227
var1,B,344
var1,C,110
var2,A,780
var2,B,190
var2,C,315
I am trying to write Median IF S. Provided above the formula used in Excel.
EDIT - Completely rewrote this answer.
I think you want this, given your data dictionary.
import pandas as pd
df = pd.DataFrame(data)
res = {'input1': [], 'input2': [], 'results': []}
for i1, i2 in zip(set(data['var1']), set(data['var2'])):
temp = df[(df['var1'] == i1) & (df['var2'] == i2)]
row_median = temp['values'].median()
res['input1'].append(i1)
res['input2'].append(i2)
res['results'].append(row_median)
print(pd.DataFrame(res))