I am trying to do some nls regression using R within Python. I am getting stuck with a RRuntimeError
and am getting to a point where I am way outside my expertise and have struggled for a few days to get it to work so would appreciate some help.
This is my csv of data: http://www.sharecsv.com/s/4cdd4f832b606d6616260f9dc0eedf38/ratedata.csv
This is my code:
import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
pandas2ri.activate()
dfData = pd.read_csv('C:\\Users\\nick\\Desktop\\ratedata.csv')
rdf = pandas2ri.py2ri(dfData)
a = 0.5
b = 1.1
count = rdf.rx(True, 'Trials')
rates = rdf.rx(True, 'Successes')
base = importr('base', robject_translations={'with': '_with'})
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})
my_formula = stats.as_formula('rates ~ 1-(1/(10^(a * count ^ (b-1))))')
d = ro.ListVector({'a': a, 'b': b})
fit = stats.nls(my_formula, weights=count, start=d)
Everything is compiling apart from:
fit = stats.nls(my_formula, weights=count, start=d)
I am getting the following traceback:
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
<ipython-input-12-3f7fcd7d7851> in <module>()
6 d = ro.ListVector({'a': a, 'b': b})
7
----> 8 fit = stats.nls(my_formula, weights=count, start=d)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\functions.py in __call__(self, *args, **kwargs)
176 v = kwargs.pop(k)
177 kwargs[r_k] = v
--> 178 return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
179
180 pattern_link = re.compile(r'\\link\{(.+?)\}')
~\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\functions.py in __call__(self, *args, **kwargs)
104 for k, v in kwargs.items():
105 new_kwargs[k] = conversion.py2ri(v)
--> 106 res = super(Function, self).__call__(*new_args, **new_kwargs)
107 res = conversion.ri2ro(res)
108 return res
RRuntimeError: Error in (function (formula, data = parent.frame(), start, control = nls.control(), :
parameters without starting value in 'data': rates, count
I would be eternally thankful if anyone can see where I am going wrong, or can offer advice. All I want is the two numbers from that formula back in Python so I can use those to construct some confidence intervals.
Thank you
Consider incorporating all your formula variables into a single dataframe and use the data argument. The as_formula
call looks in the R environment but rates and count are in the Python scope. Hence, contain all items in same object. Then run your nls
with either the Pandas dataframe or R dataframe:
import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
base = importr('base', robject_translations={'with': '_with'})
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})
a = 0.05
b = 1.1
d = ro.ListVector({'a': a, 'b': b})
dfData = pd.read_csv('Input.csv')
dfData['count'] = dfData['Trials'].astype('float')
dfData['rates'] = dfData['Successes'] / dfData['Trials']
dfData['a'] = a
dfData['b'] = b
pandas2ri.activate()
rdf = pandas2ri.py2ri(dfData)
my_formula = stats.as_formula('rates ~ 1-(1/(10^(a * count ^ (b-1))))')
# WITH PANDAS DATAFRAME
fit = stats.nls(formula=my_formula, data=dfData, weights=dfData['count'], start=d)
print(fit)
# WITH R DATAFRAME
fit = stats.nls(formula=my_formula, data=rdf, weights=rdf.rx(True, 'count'), start=d)
print(fit)
Alternatively, you can use robjects.globalenv
and not use data
argument:
ro.globalenv['rates'] = dfData['rates']
ro.globalenv['count'] = dfData['count']
ro.globalenv['a'] = dfData['a']
ro.globalenv['b'] = dfData['b']
fit = stats.nls(formula=my_formula, weights=dfData['count'], start=d)
print(fit)
# Nonlinear regression model
# model: rates ~ 1 - (1/(10^(a * count^(b - 1))))
# data: parent.frame()
# a b
# 0.01043 1.24943
# weighted residual sum-of-squares: 14.37
# Number of iterations to convergence: 6
# Achieved convergence tolerance: 9.793e-07
# To return parameters
num = fit.rx('m')[0].names.index('getPars')
obj = fit.rx('m')[0][num]()
print(obj[0])
# 0.010425686223717435
print(obj[1])
# 1.2494303314553932
Equivalently in R:
dfData <- read.csv('Input.csv')
a <- .05
b <- 1.1
d <- list(a=a, b=b)
dfData$count <- dfData$Trials
dfData$rates <- dfData$Successes / dfData$Trials
dfData$a <- a
dfData$b <- b
my_formula <- stats::as.formula("rates ~ 1-(1/(10^(a * count ^ (b-1))))")
fit <- stats::nls(my_formula, data=dfData, weights=dfData$count, start=d)
print(fit)
# Nonlinear regression model
# model: rates ~ 1 - (1/(10^(a * count^(b - 1))))
# data: dfData
# a b
# 0.01043 1.24943
# weighted residual sum-of-squares: 14.37
# Number of iterations to convergence: 6
# Achieved convergence tolerance: 9.793e-07
# To return parameters
fit$m$getPars()['a']
# 0.01042569
fit$m$getPars()['b']
# 1.24943