Search code examples
pythonrpandasrpy2

Python rpy2 - nls regression RRuntimeError


I am trying to do some nls regression using R within Python. I am getting stuck with a RRuntimeError and am getting to a point where I am way outside my expertise and have struggled for a few days to get it to work so would appreciate some help.

This is my csv of data: http://www.sharecsv.com/s/4cdd4f832b606d6616260f9dc0eedf38/ratedata.csv

This is my code:

import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
pandas2ri.activate()

dfData = pd.read_csv('C:\\Users\\nick\\Desktop\\ratedata.csv')
rdf = pandas2ri.py2ri(dfData)

a = 0.5
b = 1.1
count = rdf.rx(True, 'Trials')
rates = rdf.rx(True, 'Successes')

base = importr('base', robject_translations={'with': '_with'})
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})

my_formula = stats.as_formula('rates ~ 1-(1/(10^(a * count ^ (b-1))))')

d = ro.ListVector({'a': a, 'b': b})

fit = stats.nls(my_formula, weights=count, start=d)

Everything is compiling apart from:

fit = stats.nls(my_formula, weights=count, start=d)

I am getting the following traceback:

---------------------------------------------------------------------------
RRuntimeError                             Traceback (most recent call last)
<ipython-input-12-3f7fcd7d7851> in <module>()
      6 d = ro.ListVector({'a': a, 'b': b})
      7 
----> 8 fit = stats.nls(my_formula, weights=count, start=d)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\functions.py in __call__(self, *args, **kwargs)
    176                 v = kwargs.pop(k)
    177                 kwargs[r_k] = v
--> 178         return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
    179 
    180 pattern_link = re.compile(r'\\link\{(.+?)\}')

~\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\functions.py in __call__(self, *args, **kwargs)
    104         for k, v in kwargs.items():
    105             new_kwargs[k] = conversion.py2ri(v)
--> 106         res = super(Function, self).__call__(*new_args, **new_kwargs)
    107         res = conversion.ri2ro(res)
    108         return res

RRuntimeError: Error in (function (formula, data = parent.frame(), start, control = nls.control(),  : 
  parameters without starting value in 'data': rates, count

I would be eternally thankful if anyone can see where I am going wrong, or can offer advice. All I want is the two numbers from that formula back in Python so I can use those to construct some confidence intervals.

Thank you


Solution

  • Consider incorporating all your formula variables into a single dataframe and use the data argument. The as_formula call looks in the R environment but rates and count are in the Python scope. Hence, contain all items in same object. Then run your nls with either the Pandas dataframe or R dataframe:

    import pandas as pd
    import rpy2.robjects as ro
    from rpy2.robjects.packages import importr
    from rpy2.robjects import pandas2ri
    
    base = importr('base', robject_translations={'with': '_with'})
    stats = importr('stats', robject_translations={'format_perc': '_format_perc'})
    
    a = 0.05
    b = 1.1
    d = ro.ListVector({'a': a, 'b': b})
    
    dfData = pd.read_csv('Input.csv')
    dfData['count'] = dfData['Trials'].astype('float')
    dfData['rates'] = dfData['Successes'] / dfData['Trials']
    dfData['a'] = a
    dfData['b'] = b
    
    pandas2ri.activate()
    
    rdf = pandas2ri.py2ri(dfData)
    
    my_formula = stats.as_formula('rates ~ 1-(1/(10^(a * count ^ (b-1))))')
    
    # WITH PANDAS DATAFRAME
    fit = stats.nls(formula=my_formula, data=dfData, weights=dfData['count'], start=d)
    print(fit)
    
    # WITH R DATAFRAME
    fit = stats.nls(formula=my_formula, data=rdf, weights=rdf.rx(True, 'count'), start=d)
    print(fit)
    

    Alternatively, you can use robjects.globalenv and not use data argument:

    ro.globalenv['rates'] = dfData['rates']
    ro.globalenv['count'] = dfData['count']
    ro.globalenv['a'] = dfData['a']
    ro.globalenv['b'] = dfData['b']
    
    fit = stats.nls(formula=my_formula, weights=dfData['count'], start=d)
    print(fit)
    
    # Nonlinear regression model    
    #   model: rates ~ 1 - (1/(10^(a * count^(b - 1))))    
    #    data: parent.frame()
    
    #       a       b     
    # 0.01043 1.24943     
    #  weighted residual sum-of-squares: 14.37       
    
    # Number of iterations to convergence: 6     
    # Achieved convergence tolerance: 9.793e-07
    
    # To return parameters    
    num = fit.rx('m')[0].names.index('getPars')
    obj = fit.rx('m')[0][num]()
    
    print(obj[0])
    # 0.010425686223717435
    
    print(obj[1])
    # 1.2494303314553932
    

    Equivalently in R:

    dfData <- read.csv('Input.csv')
    
    a <- .05
    b <- 1.1  
    d <- list(a=a, b=b)
    
    dfData$count <- dfData$Trials
    dfData$rates <- dfData$Successes / dfData$Trials
    dfData$a <- a
    dfData$b <- b
    
    my_formula <- stats::as.formula("rates ~ 1-(1/(10^(a * count ^ (b-1))))")
    
    fit <- stats::nls(my_formula, data=dfData, weights=dfData$count, start=d)
    print(fit)
    
    # Nonlinear regression model
    #   model: rates ~ 1 - (1/(10^(a * count^(b - 1))))
    #    data: dfData
    #       a       b 
    # 0.01043 1.24943 
    #  weighted residual sum-of-squares: 14.37
    
    # Number of iterations to convergence: 6 
    # Achieved convergence tolerance: 9.793e-07
    
    # To return parameters  
    fit$m$getPars()['a']
    # 0.01042569 
    
    fit$m$getPars()['b']
    # 1.24943