Search code examples
pythonpandasdataframeformuladivision

Divide dataframe column value by the total of the column


My question might be too easy for many of you but since i'm a beginner with Python..

I want to have the % by value of a column containing 3 different possible values (1,0,-1) but by excluding one of the values in the column (which is -1).

I did this : (df['col_name']).sum()/len(df.col_name)

However it also counts the -1 in it, while i just want to have the % of the value 1/total sum, but without -1 in the total sum.

Thank you for your help.


Solution

  • For exclude values replace -1 to missing values:

    df['col_name'].replace(-1, np.nan).sum()/len(df.col_name) 
    

    Or filter out -1 values if need count lengths of filtered Series:

    np.random.seed(123)
    df = pd.DataFrame({'col_name':np.random.choice([0,1,-1], size=10)})
    
    print (df)
       col_name
    0        -1
    1         1
    2        -1
    3        -1
    4         0
    5        -1
    6        -1
    7         1
    8        -1
    9         1
    
    s = df.loc[df['col_name'] != -1, 'col_name']
    print (s)
    1    1
    4    0
    7    1
    9    1
    Name: col_name, dtype: int32
    
    print (s.sum()/len(s))
    0.75
    
    print (s.mean())
    0.75