I am trying to plot a power regression line in the formula y ~ a*x^b using ggplot in R.
It looks like I need to use the method = "nls" under stat_smooth
but am getting a warning message that missing or infinity values are produced when evaluating the model.
I can get it to work if I exclude some points from my data, but I can't see any difference between those points and the ones that work What is going on here?
Data
library(dplyr)
library(ggplot2)
mydf <- structure(list(x = c(3.09090909090909, 1.09677419354839, 1.47826086956522,
2.125, 5.23076923076923, 2.42857142857143, 45.3333333333333,
2.3448275862069, 2.51851851851852, 45.3333333333333, 6.18181818181818,
1.78947368421053, 9.71428571428572, 13.6, 2.95652173913043, 4,
11.3333333333333, 1.51111111111111, 3.23809523809524, 6.8, 22.6666666666667,
5.66666666666667, 1.83783783783784, 2.61538461538462, 1.30769230769231,
1.7, 7.55555555555556, 1.03030303030303, 2, 1.88888888888889,
1.17241379310345, 1.13333333333333, 1.4468085106383, 8.5, 2.26666666666667,
1.01492537313433, 1.61904761904762, 3.57894736842105, 4.68965517241379,
2.19354838709677, 3.77777777777778, 17, 4.25, 1.07936507936508,
1.33333333333333, 1.74358974358974, 1.28301886792453, 2.06060606060606,
4.68965517241379, 1.94285714285714, 2.83333333333333, 1.04615384615385,
2.72, 1.58139534883721, 1.23636363636364, 1.41666666666667, 1.65853658536585,
1.15254237288136, 1.11475409836066, 1.38775510204082, 1.0625,
1.36, 3.4, 1.19298245614035, 1.21428571428571, 1.25925925925926,
1.54545454545455), y = c(193.571428571429, 289, 249, 215.285714285714,
179.857142857143, 204, 141.142857142857, 206.142857142857, 201.428571428571,
141.142857142857, 178.285714285714, 224.714285714286, 165.714285714286,
160, 198.428571428571, 184.857142857143, 164.285714285714, 245.857142857143,
192.714285714286, 172, 150.714285714286, 179.142857142857, 223.571428571429,
200.714285714286, 259.285714285714, 230, 171.714285714286, 315.714285714286,
216, 220.571428571429, 277, 285, 250, 170.285714285714, 209.142857142857,
335.714285714286, 238.285714285714, 188.571428571429, 181.142857142857,
210.857142857143, 185.714285714286, 151.142857142857, 181.428571428571,
295.857142857143, 257.142857142857, 225.857142857143, 261.714285714286,
215.714285714286, 181.142857142857, 217.857142857143, 199.714285714286,
313.571428571429, 200.428571428571, 242.857142857143, 269.714285714286,
250.142857142857, 234.857142857143, 282, 285.142857142857, 253.285714285714,
309.857142857143, 253.857142857143, 189.142857142857, 274.714285714286,
273.285714285714, 261.857142857143, 244.142857142857)), row.names = c(NA, -67), class = "data.frame")
My code:
mydf %>%
ggplot(aes(x = x, y = y))+
geom_point()+
stat_smooth(
method = "nls",
formula = "y~a*x^b",
method.args = list(start= c(a = 1,b=1)),
se=FALSE
)
# Warning message:
# Computation failed in `stat_smooth()`:
# Missing value or an infinity produced when evaluating the model
It works if I exclude selected points:
mydf_trunc <- mydf[c(1:51, 53:58, 60, 63, 67),]
mydf_trunc%>%
ggplot(aes(x = x, y = y))+
geom_point()+
stat_smooth(
method = "nls",
formula = "y~a*x^b",
method.args = list(start= c(a = 1,b=1)),
se=FALSE
)
But I can't see any difference between the points that work and the ones that cause problems. In this figure the red points are the ones that cause the stat_smooth to fail
bad_pts <- mydf[c(52,59, 61, 62, 64, 65, 66),]
bad_pts
# # A tibble: 7 x 2
# x y
# <dbl> <dbl>
# 1 1.05 314.
# 2 1.11 285.
# 3 1.06 310.
# 4 1.36 254.
# 5 1.19 275.
# 6 1.21 273.
# 7 1.26 262.
The warning message says that these points are causing missing or infite values, but they appear to fit along the same curve as the rest of the points, and there aren't any missing values.
I suspect that a) the offending points are duplicates of some of the other points at least for their x values and thus giving you infinite slope in parts of the estimation process, and that b) the asymptote should be allowed to be above zero:
ggplot(data=mydf, aes(x = jitter(x), y = y))+
geom_point()+
geom_smooth(
method="nls",
formula= y~ a*x^(-b) +cc,
method.args = list(start= c(a = 1,b=2 ,cc=100)),
se=FALSE
)