Python rpy2 - nls regression RRuntimeError

I am trying to do some nls regression using R within Python. I am getting stuck with a RRuntimeError and am getting to a point where I am way outside my expertise and have struggled for a few days to get it to work so would appreciate some help.

This is my csv of data: http://www.sharecsv.com/s/4cdd4f832b606d6616260f9dc0eedf38/ratedata.csv

This is my code:

import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
pandas2ri.activate()

dfData = pd.read_csv('C:\\Users\\nick\\Desktop\\ratedata.csv')
rdf = pandas2ri.py2ri(dfData)

a = 0.5
b = 1.1
count = rdf.rx(True, 'Trials')
rates = rdf.rx(True, 'Successes')

base = importr('base', robject_translations={'with': '_with'})
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})

my_formula = stats.as_formula('rates ~ 1-(1/(10^(a * count ^ (b-1))))')

d = ro.ListVector({'a': a, 'b': b})

fit = stats.nls(my_formula, weights=count, start=d)

Everything is compiling apart from:

fit = stats.nls(my_formula, weights=count, start=d)

I am getting the following traceback:

---------------------------------------------------------------------------
RRuntimeError                             Traceback (most recent call last)
<ipython-input-12-3f7fcd7d7851> in <module>()
      6 d = ro.ListVector({'a': a, 'b': b})
      7 
----> 8 fit = stats.nls(my_formula, weights=count, start=d)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\functions.py in __call__(self, *args, **kwargs)
    176                 v = kwargs.pop(k)
    177                 kwargs[r_k] = v
--> 178         return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
    179 
    180 pattern_link = re.compile(r'\\link\{(.+?)\}')

~\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\functions.py in __call__(self, *args, **kwargs)
    104         for k, v in kwargs.items():
    105             new_kwargs[k] = conversion.py2ri(v)
--> 106         res = super(Function, self).__call__(*new_args, **new_kwargs)
    107         res = conversion.ri2ro(res)
    108         return res

RRuntimeError: Error in (function (formula, data = parent.frame(), start, control = nls.control(),  : 
  parameters without starting value in 'data': rates, count

I would be eternally thankful if anyone can see where I am going wrong, or can offer advice. All I want is the two numbers from that formula back in Python so I can use those to construct some confidence intervals.

Thank you

Solution

Consider incorporating all your formula variables into a single dataframe and use the data argument. The as_formula call looks in the R environment but rates and count are in the Python scope. Hence, contain all items in same object. Then run your nls with either the Pandas dataframe or R dataframe:

import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri

base = importr('base', robject_translations={'with': '_with'})
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})

a = 0.05
b = 1.1
d = ro.ListVector({'a': a, 'b': b})

dfData = pd.read_csv('Input.csv')
dfData['count'] = dfData['Trials'].astype('float')
dfData['rates'] = dfData['Successes'] / dfData['Trials']
dfData['a'] = a
dfData['b'] = b

pandas2ri.activate()

rdf = pandas2ri.py2ri(dfData)

my_formula = stats.as_formula('rates ~ 1-(1/(10^(a * count ^ (b-1))))')

# WITH PANDAS DATAFRAME
fit = stats.nls(formula=my_formula, data=dfData, weights=dfData['count'], start=d)
print(fit)

# WITH R DATAFRAME
fit = stats.nls(formula=my_formula, data=rdf, weights=rdf.rx(True, 'count'), start=d)
print(fit)

Alternatively, you can use robjects.globalenv and not use data argument:

ro.globalenv['rates'] = dfData['rates']
ro.globalenv['count'] = dfData['count']
ro.globalenv['a'] = dfData['a']
ro.globalenv['b'] = dfData['b']

fit = stats.nls(formula=my_formula, weights=dfData['count'], start=d)
print(fit)

# Nonlinear regression model    
#   model: rates ~ 1 - (1/(10^(a * count^(b - 1))))    
#    data: parent.frame()

#       a       b     
# 0.01043 1.24943     
#  weighted residual sum-of-squares: 14.37       

# Number of iterations to convergence: 6     
# Achieved convergence tolerance: 9.793e-07

# To return parameters    
num = fit.rx('m')[0].names.index('getPars')
obj = fit.rx('m')[0][num]()

print(obj[0])
# 0.010425686223717435

print(obj[1])
# 1.2494303314553932

Equivalently in R:

dfData <- read.csv('Input.csv')

a <- .05
b <- 1.1  
d <- list(a=a, b=b)

dfData$count <- dfData$Trials
dfData$rates <- dfData$Successes / dfData$Trials
dfData$a <- a
dfData$b <- b

my_formula <- stats::as.formula("rates ~ 1-(1/(10^(a * count ^ (b-1))))")

fit <- stats::nls(my_formula, data=dfData, weights=dfData$count, start=d)
print(fit)

# Nonlinear regression model
#   model: rates ~ 1 - (1/(10^(a * count^(b - 1))))
#    data: dfData
#       a       b 
# 0.01043 1.24943 
#  weighted residual sum-of-squares: 14.37

# Number of iterations to convergence: 6 
# Achieved convergence tolerance: 9.793e-07

# To return parameters  
fit$m$getPars()['a']
# 0.01042569 

fit$m$getPars()['b']
# 1.24943