What are possible reasons as to why this is happening? It always happens after the value 10.
A subset of the dataset around the area of interest before and after the regression was applied:
This is the ggplot2
call that I am using to generate the graph. The smoothing span used is 0.05.
dat <- read.csv("before_loess.csv", stringsAsFactors = FALSE)
smoothed.data <- applyLoessSmooth(dat, 0.05) # dat is the dataset before being smoothed
scan.plot.data <- melt(smoothed.data, id.vars = "sample.diameters", variable.name = 'series')
scan.plot <- ggplot(data = scan.plot.data, aes(sample.diameters, value)) +
geom_line(aes(colour = series)) +
xlab("Diameters (nm)") +
ylab("Concentration (dN#/cm^2)") +
theme(plot.title = element_text(hjust = 0.5))
Function used to apply the loess filter:
applyLoessSmooth <- function(raw.data, smoothing.span) {
raw.data <- raw.data[complete.cases(raw.data),]
## response
vars <- colnames(raw.data)
## covariate
id <- 1:nrow(raw.data)
## define a loess filter function (fitting loess regression line)
loess.filter <- function (x, given.data, span) loess(formula = as.formula(paste(x, "id", sep = "~")),
data = given.data,
degree = 1,
span = span)$fitted
## apply filter column-by-column
loess.graph.data <- as.data.frame(lapply(vars, loess.filter, given.data = raw.data, span = smoothing.span),
col.names = colnames(raw.data))
sample.rows <- length(loess.graph.data[1])
loess.graph.data <- loess.graph.data %>% mutate("sample.diameters" = raw.data$sample.diameters[1:nrow(raw.data)])
}
The first problem is simply that your data is rounded to three significant figures. Below 10, the values on your x axis scan.plot.data$sample.diameters
increase in 0.01 increments, which produces a smooth curve on the chart, but after 10 they increase in 0.1 increments, which shows up as visible steps on the chart.
The second problem is that you should be regressing against the values of sample.diameters
, rather than against the row numbers id
. I think this is causing there to be multiple smoothed values for each distinct value of x - hence the steps. Here are a couple of suggested small modifications to your function...
applyLoessSmooth <- function(raw.data, smoothing.span) {
raw.data <- raw.data[complete.cases(raw.data),]
vars <- colnames(raw.data)
vars <- vars[vars != "sample.diameters"] #you are regressing against this, so exclude it from vars
loess.filter <- function (x, given.data, span) loess(
formula = as.formula(paste(x, "sample.diameters", sep = "~")), #not 'id'
data = given.data,
degree = 1,
span = span)$fitted
loess.graph.data <- as.data.frame(lapply(vars, loess.filter, given.data = raw.data,
span = smoothing.span),
col.names = vars) #final argument edited
loess.graph.data$sample.diameters <- raw.data$sample.diameters #simplified
return(loess.graph.data)
}
All of which seems to do the trick...
Of course, you could have just done this...
dat.melt <- melt(dat, id.vars = "sample.diameters", variable.name = 'series')
ggplot(data = dat.melt, aes(sample.diameters, value, colour=series)) +
geom_smooth(method="loess", span=0.05, se=FALSE)