Search code examples
rsplitreorderlist

Reordering a list by element in list and remove specified rows in list


Continuation to this question: Add row in R dataframe by unique factor in column showing percent change by Month

testing <- data.frame(
  MONTH = c("MTD: 12", "MTD: 12", "MTD: 11", "MTD: 12", "MTD: 12", "MTD: 12"), 
  YEAR = c(2012, 2013, 2014, 2015, 2013, 2014), 
  Client = c("A.", "A.", "A.", "B.", "B.", "B."), 
  Revenue = c(320, 205, 166L, 152, 150, 138),
  Col1 = c(651, 485, 533, 3932, 171, 436), 
  Col2 = c(478, 335, 305, 238, 115, 251), 
  Col3 = c(73, 69, 57, 6, 67, 57), 
  Col4 = c(6.7, 6.1, 5.5, 6.4, 13.1, 5.5)
)

# subset just the month=12 rows
test12 <- testing[testing$MONTH=="MTD: 12", ]
test12 <- test12[order(test12$Client, test12$YEAR), ]

# define a function to calculate percent change
pctchange <- function(x) {
  L <- length(x)
  c(NA, 100 * (x[-1] - x[-L]) / x[-L])
}

# calculate percent change for all columns, by client
change <- apply(test12[, c("Revenue", "Col1", "Col2", "Col3", "Col4")], 2,
  function(y) unlist(tapply(y, test12$Client, pctchange)))
change <- data.frame(change)
names(change) <- paste0("d", names(change))
test12b <- cbind(test12[, c("MONTH", "YEAR", "Client")], change)

# merge back with monthly data
merge(testing, test12b, all=TRUE)

So after running this code you get a list that has been splitted by client.

I want to run the following code that will essentially remove the 2nd row if the number of rows for that factor (the client) is greater than 2.

I tried this already and it didn't work:

testing<-ifelse(length(splitresult)>2,splitresult[-2,],splitresult)

Ultimate goal out of all of this:

1) To just get the percent change of the last year with the previous year and don't show the inbetween stuff like NA for the previous year. But if it is a new client I do want the NA there to specify it is a new client. So that is why I tried out the code above which didn't work.

2) I want to reorder the clients in the split by Revenue in MTD: 12 2014.

splitlist[order(sapply(splitlist, function(x) (x[["Revenue"]])))]

(Didn't work: assume splitlist is the name of the list)

If anyone can help me with either question it would be extremely helpful. Thanks!


Solution

  • I think the plyr package would help here. For example, instead of the last line of code where you use ifelse, you could try

    library(plyr)
    out = ddply(splitresult, "Client", function(x){
      if(dim(x)[1] > 2) x = x[-2,]
      return(x)
    })
    

    Here, x is a client-specific data frame, and out is the result of combining the rows of a bunch of client-specific data frames.

    You might also check out lubridate, which will make dates and times easier to deal with. As mentioned in the comments, dplyr would also be helpful, as would the rest of the "Hadleyverse" of packages for cleaning and plotting data. The solutions to your questions 1 and 2, along with the whole process of cleaning and summarizing, will be much cleaner and easier with the right tools.