I have a data frame is like this:
dput(xx)
structure(list(TimeStamp = structure(c(15705, 15706), class = "Date"),
Host = c("Host1", "Host2"), OS = structure(c(1L, 1L), .Label = "solaris", class = "factor"),
ID = structure(c(1L, 1L), .Label = "1234", class = "factor"),
Class = structure(c(1L, 1L), .Label = "Processor", class = "factor"),
Stat = structure(c(1L, 1L), .Label = "CPU", class = "factor"),
Instance = structure(c(1L, 1L), .Label = c("_Total", "CPU0",
"CPU1", "CPU10", "CPU11", "CPU12", "CPU13", "CPU14", "CPU15",
"CPU16", "CPU17", "CPU18", "CPU19", "CPU2", "CPU20", "CPU21",
"CPU22", "CPU23", "CPU3", "CPU4", "CPU5", "CPU6", "CPU7",
"CPU8", "CPU9"), class = "factor"), Average = c(4.39009345794392,
5.3152972972973), Min = c(3.35, -0.01), Max = c(5.15, 72.31
)), .Names = c("TimeStamp", "Host", "OS", "ID", "Class",
"Stat", "Instance", "Average", "Min", "Max"), row.names = c(NA,
-2L), class = "data.frame")
This data frame is huge and it has many Hosts. The challenge that I am having is that when a host like above does not have enough data points, the following ggplot fails, basically complaining about not having enough data points to draw the graph.
ggplot(xx, aes(TimeStamp, Max, group=Host, colour=Host)) + geom_point() + geom_smooth(mehtod="loess")
How can I check and see if a particular Host in this data frame has greater than 10 data points, if yes use method="loess". if the number of data points for a Host is less than 10, use method="lm"
Yes, it was tricky to find, but it seems to be possible,
# for reproducibility
set.seed(42)
# The idea is to first split the data to < 10 and >= 10 points
# I use data.table for that
require(data.table)
dt <- data.frame(Host = rep(paste("Host", 1:10, sep=""), sample(1:20, 10)),
stringsAsFactors = FALSE)
dt <- transform(dt, x=sample(1:nrow(dt)), y = 15*(1:nrow(dt)))
dt <- data.table(dt, key="Host")
dt1 <- dt[, .SD[.N >= 10], by = Host]
dt2 <- dt[, .SD[.N < 10], by = Host]
# on to plotting now
require(ggplot2)
# Now, dt1 has all Hosts with >= 10 observations and dt2 the other way round
# plot now for dt1
p <- ggplot(data=dt1, aes(x = x, y = y, group = Host)) + geom_line() +
geom_smooth(method="loess", se=T)
# plot geom_line for dt2 by telling the data and aes
# The TRICKY part: add geom_smooth by telling data=dt2
p <- p + geom_line(data = dt2, aes(x=x, y=y, group = Host)) +
geom_smooth(data = dt2, method="lm", se=T)
p
(This is an ugly example. But it gives you the idea).