Continuation to this question: Add row in R dataframe by unique factor in column showing percent change by Month
testing <- data.frame(
MONTH = c("MTD: 12", "MTD: 12", "MTD: 11", "MTD: 12", "MTD: 12", "MTD: 12"),
YEAR = c(2012, 2013, 2014, 2015, 2013, 2014),
Client = c("A.", "A.", "A.", "B.", "B.", "B."),
Revenue = c(320, 205, 166L, 152, 150, 138),
Col1 = c(651, 485, 533, 3932, 171, 436),
Col2 = c(478, 335, 305, 238, 115, 251),
Col3 = c(73, 69, 57, 6, 67, 57),
Col4 = c(6.7, 6.1, 5.5, 6.4, 13.1, 5.5)
)
# subset just the month=12 rows
test12 <- testing[testing$MONTH=="MTD: 12", ]
test12 <- test12[order(test12$Client, test12$YEAR), ]
# define a function to calculate percent change
pctchange <- function(x) {
L <- length(x)
c(NA, 100 * (x[-1] - x[-L]) / x[-L])
}
# calculate percent change for all columns, by client
change <- apply(test12[, c("Revenue", "Col1", "Col2", "Col3", "Col4")], 2,
function(y) unlist(tapply(y, test12$Client, pctchange)))
change <- data.frame(change)
names(change) <- paste0("d", names(change))
test12b <- cbind(test12[, c("MONTH", "YEAR", "Client")], change)
# merge back with monthly data
merge(testing, test12b, all=TRUE)
So after running this code you get a list that has been splitted by client.
I want to run the following code that will essentially remove the 2nd row if the number of rows for that factor (the client) is greater than 2.
I tried this already and it didn't work:
testing<-ifelse(length(splitresult)>2,splitresult[-2,],splitresult)
Ultimate goal out of all of this:
1) To just get the percent change of the last year with the previous year and don't show the inbetween stuff like NA for the previous year. But if it is a new client I do want the NA there to specify it is a new client. So that is why I tried out the code above which didn't work.
2) I want to reorder the clients in the split by Revenue in MTD: 12 2014.
splitlist[order(sapply(splitlist, function(x) (x[["Revenue"]])))]
(Didn't work: assume splitlist is the name of the list)
If anyone can help me with either question it would be extremely helpful. Thanks!
I think the plyr
package would help here. For example, instead of the last line of code where you use ifelse
, you could try
library(plyr)
out = ddply(splitresult, "Client", function(x){
if(dim(x)[1] > 2) x = x[-2,]
return(x)
})
Here, x
is a client-specific data frame, and out
is the result of combining the rows of a bunch of client-specific data frames.
You might also check out lubridate
, which will make dates and times easier to deal with. As mentioned in the comments, dplyr
would also be helpful, as would the rest of the "Hadleyverse" of packages for cleaning and plotting data. The solutions to your questions 1 and 2, along with the whole process of cleaning and summarizing, will be much cleaner and easier with the right tools.