Search code examples
rdata-cleaning

how to calculate the slope of a smoothed line with time as x-axis in R


My data looks like this:

   Time                   Type zone price no_activities   id 
1  2014/10/11 12:30:00 am  A    X    20    10              1
2  2014/10/12 12:30:00 am  A    X    20    10              2
3  2014/10/13 12:30:00 am  B    X    10     9              3
4  2014/10/14 12:30:00 am  D    X     5    12              4
5  2014/10/15 12:30:00 am  D    Y     6     5              5
6  2014/10/16 12:30:00 am  B    Y     7     8              6
7  2014/10/17 12:30:00 am  B    Y     7     8              7
8  2014/10/18 12:30:00 am  A    Y     9     5              8
9  2014/10/19 12:30:00 am  C    Y    20    23              9

I am able to draw the smooth lines using code below and I would like to calculate the slope of the lines.

subdf1 <- df1 %>% 
  mutate(day = as.Date(Time)) %>%
  group_by(zone, day, Type) %>%
  summarize(dailyact = sum(no_activities, na.rm = TRUE))

ggplot(subdf1, aes(x=day, y= dailyact, color = Type)) + 
  scale_y_log10() +
  geom_smooth(method = "lm", se=FALSE, size =0.5) +
  facet_wrap( ~ zone)

Code I used to calculate the slope of the line:

slope = diff(subdf1$dailyact)/diff(subdf1$day)

However, the x-axis Time is "POSIXct" "POSIXt" format. I get the error below when trying to calculate the slope:

Error in `/.difftime`(diff(subdf1$dailyact), diff(subdf1$day)) : second argument of / cannot be a "difftime" object

Does anyone know a way to do this? Thank you very much.


Solution

  • I assume you want to calculate a slope by zone (because that's what you do with geom_smooth in your ggplot). In that case, I would suggest fitting a linear model to your data by zone, and then extracting the slope parameters.

    A tidyverse approach would look like this

    library(tidyverse)
    subdf1 %>%
        group_by(zone) %>%
        nest() %>%
        mutate(slope = map_dbl(data, ~coef(lm(day ~ dailyact, data = .x))[2]))
    

    You don't provide enough data here for me to show a sensible output (all the slope parameters are NA based on the sample data you give).


    Since your sample data is a bit too small to give sensible results, here is another example, based on the iris dataset.

    First, let's show a plot of Sepal.Length as a function of Petal.Length. We use geom_smooth to show results from a linear fit to the data for every Species.

    iris %>%
        ggplot(aes(Sepal.Length, Petal.Length)) +
        geom_point() +
        geom_smooth(method = "lm") +
        facet_wrap(~ Species)
    

    enter image description here

    To get the slope estimates of the linear model fits we do the following

    iris %>%
        group_by(Species) %>%
        nest() %>%
        mutate(slope = map_dbl(data, ~coef(lm(Sepal.Length ~ Petal.Length, data = .x))[2]))
    ## A tibble: 3 x 3
    #  Species    data              slope
    #  <fct>      <list>            <dbl>
    #1 setosa     <tibble [50 × 4]> 0.542
    #2 versicolor <tibble [50 × 4]> 0.828
    #3 virginica  <tibble [50 × 4]> 0.996