Search code examples
rdatetimeggplot2posixct

Line break when no data in ggplot2


I am using R to plot some data.

Date <- c("07/12/2012 05:00:00", "07/12/2012 06:00:00", "07/12/2012 07:00:00",
      "07/12/2012 08:00:00","07/12/2012 10:00:00","07/12/2012 11:00:00")
Date <- strptime(Date, "%d/%m/%Y %H:%M")
Counts <- c("0","3","10","6","5","4")
Counts <- as.numeric(Counts)
df1 <- data.frame(Date,Counts,stringsAsFactors = FALSE)
library(ggplot2)
g = ggplot(df1, aes(x=Date, y=Counts)) + geom_line(aes(group = 1))
g

How do I ask R not to plot data as a continuous line when there is a break in time? I normally have a data point every hour, but sometimes there is a break (between 8 am and 10 am). Between these points, I don't want the line to connect. Is this possible in R?

Edit

Many thanks for the responses here. My data is now in 10 second intervals, and I wish to do the same piece of analysis using this data.

df <- structure(list(Date = c("11/12/2012", "11/12/2012", "11/12/2012", 
                     "11/12/2012", "11/12/2012", "11/12/2012", "11/12/2012", 
                     "11/12/2012", "11/12/2012", "11/12/2012", "11/12/2012"),
                     Time = c("20:16:00", "20:16:10", "20:16:20", "20:16:30", 
                     "20:16:40", "20:16:50", "20:43:30", "20:43:40", 
                     "20:43:50", "20:44:00", "20:44:10"),
                     Axis1 = c(181L, 14L, 65L, 79L, 137L, 104L, 7L, 0L, 0L, 
                     14L, 0L),
                     Steps = c(13L, 1L, 6L, 3L, 8L, 4L, 1L, 0L, 0L, 0L, 0L)),
                .Names = c("Date", "Time", "Axis1", "Steps"),
                row.names = c(57337L, 57338L, 57339L, 57340L, 57341L, 57342L, 
                57502L, 57503L, 57504L, 57505L, 57506L), class = "data.frame")

I think I understand what the code is trying to do, when it adds the column 'group' to the original dataframe, but my question surrounds how I get R to know the data is now in 10 second intervals? When I apply the first line of code to determine whether the numbers are continuous or whether there is a gap (e.g. idx <- c(1, diff(df$Time)), I get the following error:

Error in r[i1] - r[-length(r):-(length(r) - lag + 1L)] : 
  non-numeric argument to binary operator

After my Time variable, do I need to add as.POSIXct to ensure it recognises the time correctly?


Solution

  • You'll have to set group by setting a common value to those points you'd like to be connected. Here, you can set the first 4 values to say 1 and the last 2 to 2. And keep them as factors. That is,

    df1$grp <- factor(rep(1:2, c(4,2)))
    g <- ggplot(df1, aes(x=Date, y=Counts)) + geom_line(aes(group = grp)) + 
                         geom_point()
    

    Edit: Once you have your data.frame loaded, you can use this code to automatically generate the grp column:

    idx <- c(1, diff(df$Date))
    i2 <- c(1,which(idx != 1), nrow(df)+1)
    df1$grp <- rep(1:length(diff(i2)), diff(i2))
    

    Note: It is important to add geom_point() as well because if the discontinuous range happens to be the LAST entry in the data.frame, it won't be plotted (as there are not 2 points to connect the line). In this case, geom_point() will plot it.

    As an example, I'll generate a data with more gaps:

    # get a test data
    set.seed(1234)
    df <- data.frame(Date=seq(as.POSIXct("05:00", format="%H:%M"), 
                    as.POSIXct("23:00", format="%H:%M"), by="hours"))
    df$Counts <- sample(19)
    df <- df[-c(4,7,17,18),]
    
    # generate the groups automatically and plot
    idx <- c(1, diff(df$Date))
    i2 <- c(1,which(idx != 1), nrow(df)+1)
    df$grp <- rep(1:length(diff(i2)), diff(i2))
    g <- ggplot(df, aes(x=Date, y=Counts)) + geom_line(aes(group = grp)) + 
                geom_point()
    g
    

    ggplot2_groups

    Edit: For your NEW data (assuming it is df),

    df$t <- strptime(paste(df$Date, df$Time), format="%d/%m/%Y %H:%M:%S")
    
    idx <- c(10, diff(df$t))
    i2 <- c(1,which(idx != 10), nrow(df)+1)
    df$grp <- rep(1:length(diff(i2)), diff(i2))
    

    now plot with aes(x=t, ...).