I am trying to write a function that does the following: 1. Produces and x,y scatterplot 2. Add a loess curve 3. Add a curve based on an AIC model fitting procedure where the best fit model could be linear, quadratic, or cubic. I want only a single line drawn for this step (i.e., the best fit, not all 3 possibilities).
I can do steps 1 and 2, but can't make #3 work. Where am I going wrong? Sample data below, but I will be running this function on various data sets, some of which will vary in length and NAs.
maindata = structure(c(1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971,
1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982,
1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993,
1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, -1.54,
1.5, 3.26, -2.79, 0.54, -0.51, -2.12, 1.83, -1.88, 0.47, -1.05,
-2.16, -1.04, -1.77, -2.54, 1.67, -2.97, -2.58, NA, -0.08,
2.05, 0.27, 2.18, 0.01, NA, -2.08, -0.42, -0.23, -1.58, -0.55,
2.63, 0.38, 2.17, -3.09, 3.14, -3.01, -0.13, 2.38, 3.88, 1.14,
2.54, 1.71, 2.86, -1.11, -1.98, -0.93, 1.03, 2.25, 1.18, -1.91,
1, -0.09, 0.7, -1.35, -0.2, 1.35, 1.72, 0.72, -5.96, 2.95, -0.25,
NA, 47, 40, NA, 20, 70, 80, 30, 33, 40, 71, 63, 25, 66, 41, 25,
38, 18, 22, 60, 85, 30, 75, 25, 80, 65, 33, 85, 95, 45, 75, 19,
75, 27, 13, 14, 15, 99, 22, 10, NA, 20, 35, 17, 55, 35, 70, 47,
24, 45, 38, 50, 90, 60, 50, 100, 42, 34, 55, 10, 15, 90, NA), .Dim = c(62L,
3L), .Dimnames = list(NULL, c("year", "x_values", "y_values")))
dd_plot = function(x, y, yaxis, xaxis) {
subset.data = subset(maindata, x!="NA" & y!="NA");
plot(subset.data$y~subset.data$x, pch = 20, ylab = yaxis, xlab=xaxis)
lines(loess.smooth(subset.data$x, subset.data$y), col = "blue", lwd = 2, lty =2)
fit = stepAIC(lm(subset.data$y~subset.data$x+I(subset.data$x^2)+I(subset.data$x^3)))
lines(subset.data$x, predict(fit), col="red")
legend("topleft", c("Lowess","Best Fit Polynomial"), lty = c(2,1), col=c("blue","red"), bty="n", xjust = -0.2)
}
dd_plot(y = y_values, x = x_values, yaxis = "Y_label", xaxis = "X_label")
Well, you don't exactly describe which part "doesn't work" or in which way it doesn't match you expectations. You sample has a bunch of mismatching variable names which perhaps your real data doesn't, bit it's looking for x
and y
in maindata which actually has x_values
and y_values
. Then your call to dd_plot
references variables x_values
and y_values
which don't exist.
But i'll assume you got past that are talking about the ball of string that's plotted rather than the simple curve for the best fit polynomial. The problem is that your data should be sorted by x-value when drawing lines. R just connects consecutive points. So i think you want
lines(subset.data$x[order(subset.data$x)],
predict(fit)[order(subset.data$x)], col="red")
instead of plotting the unsorted data. That returns