Search code examples

How to run scipy.stats.gmean for rows contanining values less than 1 and zeros?

We have the following dataframe (df)


 #Gene  GSM772  GSM773  GSM774  GSM775  GSM776
0610007P14Rik    0.003485    0.003415    0.005431    0.003667    0.007146
0610009B22Rik    0.001220    0.001351    0.001762    0.001404    0.002177
0610009L18Rik    0.000055    0.000009    0.000152    0.000082    0.000179
0610009O20Rik    0.000000    0.006830    00000000    0.006653    0.006907
0610010F05Rik    0.008310    0.008329    0.007091    0.006919    0.006915

We want to calculate Geometric Mean for every row.

  • And append the result as the last column with the column name GeometricMean.

For some rows there are "zero" values, which needs to be ignored so the geometric mean for that row is regarded as zero.

We wrote the following python script,

import scipy
import numpy
import numpy as np
from scipy.stats.mstats import gmean
from scipy import stats

numpy.seterr(divide = 'ignore') 

gmean = scipy.stats.gmean(df.iloc[:,1:5],axis=1)

results = df.assign(GeometricMean=gmean)

  • Following error is encountered:

    AttributeError: 'str' object has no attribute 'log'
     The above exception was the direct cause of the following exception:
     Traceback (most recent call last):
       File "", line 99, in <module>
         scipy.stats.gmean(df.iloc[:,1:5],axis=1) #calculates gmean rowwise, axis=1 for rowwise
       File "/home/.local/lib/python3.6/site-packages/scipy/stats/", line 402, in gmean
         log_a = np.log(np.array(a, dtype=dtype))
     TypeError: loop of ufunc does not support argument 0 of type str which has no callable log method

Can anyone please suggest the best way to resolve this issue?

Thanks !!


  • Problem solved. Actually, the above script works without any issue. Sorry, this question was posted without hindsight. We cannot delete any question, so this will stay here. Hope the script is useful for someone.

    Note, that this script will not work if the dataframe contains any column with strings. After removing those columns, this script will work without any problem in generating the last column with geometric mean for every row.

    (5, 6)
               #Gene  GSM772  GSM773  GSM774  GSM775  GSM776
    0  0610007P14Rik    0.003485    0.003415    0.005431    0.003667    0.007146
    1  0610009B22Rik    0.001220    0.001351    0.001762    0.001404    0.002177
    2  0610009L18Rik    0.000055    0.000009    0.000152    0.000082    0.000179
    3  0610009O20Rik    0.006369    0.006830    0.007176    0.006653    0.006907
    4  0610010F05Rik    0.008310    0.008329    0.007091    0.006919    0.006915
               #Gene  GSM772  GSM773  GSM774  GSM775  GSM776  GeometricMean
    0  0610007P14Rik    0.003485    0.003415    0.005431    0.003667    0.007146       0.004424
    1  0610009B22Rik    0.001220    0.001351    0.001762    0.001404    0.002177       0.001548
    2  0610009L18Rik    0.000055    0.000009    0.000152    0.000082    0.000179       0.000064
    3  0610009O20Rik    0.006369    0.006830    0.007176    0.006653    0.006907       0.006782
    4  0610010F05Rik    0.008310    0.008329    0.007091    0.006919    0.006915       0.007484