Search code examples
rggplot2regressionloess

R: Loess regression produces a staircase-like graph, rather than being smoothed, after the value 10


enter image description here

What are possible reasons as to why this is happening? It always happens after the value 10.

A subset of the dataset around the area of interest before and after the regression was applied:

This is the ggplot2 call that I am using to generate the graph. The smoothing span used is 0.05.

dat <- read.csv("before_loess.csv", stringsAsFactors = FALSE)

    smoothed.data <- applyLoessSmooth(dat, 0.05) # dat is the dataset before being smoothed

    scan.plot.data <- melt(smoothed.data, id.vars = "sample.diameters", variable.name = 'series')

    scan.plot <- ggplot(data = scan.plot.data, aes(sample.diameters, value)) +
      geom_line(aes(colour = series)) +
      xlab("Diameters (nm)") +                                                                                                                
      ylab("Concentration (dN#/cm^2)") +
      theme(plot.title = element_text(hjust = 0.5))

Function used to apply the loess filter:

applyLoessSmooth <- function(raw.data, smoothing.span) {
  raw.data <- raw.data[complete.cases(raw.data),]

  ## response
  vars <- colnames(raw.data)
  ## covariate
  id <- 1:nrow(raw.data)
  ## define a loess filter function (fitting loess regression line)
  loess.filter <- function (x, given.data, span) loess(formula = as.formula(paste(x, "id", sep = "~")),
                                           data = given.data,
                                           degree = 1,
                                           span = span)$fitted 
  ## apply filter column-by-column
  loess.graph.data <- as.data.frame(lapply(vars, loess.filter, given.data = raw.data, span = smoothing.span),
                           col.names = colnames(raw.data))
  sample.rows <- length(loess.graph.data[1])
  loess.graph.data <- loess.graph.data %>% mutate("sample.diameters" = raw.data$sample.diameters[1:nrow(raw.data)])

    }

Solution

  • The first problem is simply that your data is rounded to three significant figures. Below 10, the values on your x axis scan.plot.data$sample.diameters increase in 0.01 increments, which produces a smooth curve on the chart, but after 10 they increase in 0.1 increments, which shows up as visible steps on the chart.

    The second problem is that you should be regressing against the values of sample.diameters, rather than against the row numbers id. I think this is causing there to be multiple smoothed values for each distinct value of x - hence the steps. Here are a couple of suggested small modifications to your function...

    applyLoessSmooth <- function(raw.data, smoothing.span) {
      raw.data <- raw.data[complete.cases(raw.data),]    
      vars <- colnames(raw.data)
      vars <- vars[vars != "sample.diameters"] #you are regressing against this, so exclude it from vars
      loess.filter <- function (x, given.data, span) loess(
                        formula = as.formula(paste(x, "sample.diameters", sep = "~")), #not 'id'
                        data = given.data,
                        degree = 1,
                        span = span)$fitted 
      loess.graph.data <- as.data.frame(lapply(vars, loess.filter, given.data = raw.data, 
                                               span = smoothing.span),
                                        col.names = vars) #final argument edited
      loess.graph.data$sample.diameters <- raw.data$sample.diameters #simplified
      return(loess.graph.data)      
    }
    

    All of which seems to do the trick... enter image description here

    Of course, you could have just done this...

    dat.melt <- melt(dat, id.vars = "sample.diameters", variable.name = 'series')
    ggplot(data = dat.melt, aes(sample.diameters, value, colour=series)) +  
           geom_smooth(method="loess", span=0.05, se=FALSE)