Search code examples
pythonexcelpandasgroup-bymedian

MEDIAN IF S using groupby of Python Pandas


I am trying to run an Median IF NOT EQUAL to a column reference. I can group by and run a median. However, I couldn't able to run a Median group by not equal to referring to a value/character in columns.

import pandas as pd

# intialise data of lists. 
data={'id':[ 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
'var1':['var1','var1','var1','var1','var1','var1','var1','var1','var1','var1','var2','var2','var2','var2','var2'],
'var2':[ 'A','A','A','B','B','B','C','C','C','A','A','A','B','B','C'],
'var3':[ 'A','A','A','A','A','A','A','A','A','A','A','A','A','A','A'],
'values':[ 10,870,1731,80,110,3848,3590,344,30,60,60,190,440,780,1460]}

#Create DataFrame
df=pd.DataFrame(data)

Excel Formula:- 
=MEDIAN(IF($B:$B=H2,IF($C:$C<>$I2,$E:$E)))
Column reference:
B - var1 (input), H - grouped var1 (below output var1), C - var2(Input), I - (below output var2), E - values in input file.

Desired Output - 

var1,var2,median
var1,A,227
var1,B,344
var1,C,110
var2,A,780
var2,B,190
var2,C,315

I am trying to write Median IF S. Provided above the formula used in Excel.


Solution

  • EDIT - Completely rewrote this answer.

    I think you want this, given your data dictionary.

    import pandas as pd
    df = pd.DataFrame(data)
    
    res = {'input1': [], 'input2': [], 'results': []}
    
    for i1, i2 in zip(set(data['var1']), set(data['var2'])):
        temp = df[(df['var1'] == i1) & (df['var2'] == i2)]
        row_median = temp['values'].median()
        res['input1'].append(i1)
        res['input2'].append(i2)
        res['results'].append(row_median)
    
    print(pd.DataFrame(res))