I recently asked a question about counting the number of times an element had repeated itself (http://stackoverflow.com/questions/7669553/how-to-assign-number-of-repeats-to-dataframe-based-on-elements-of-an-identifying/7669607#7669607) in a large data-frame. I received some very helpful advice, which worked on a small number of rows, but now need to perform the operation on a much larger level (over 255k rows, with around 100k "groups" being formed using ddply):
system.time( data <- ddply(data, "uid", function(x) {x$time <- 1:nrow(x); x}) ) #uid is the grouping variable, for which I need to count the number of repeats for output like
uid time
ny1 1
ny1 2
ny2 1
ny2 2
ny2 3
Trying to perform this operation on the larger data set results in R choking due to memory issues. Are there any obvious solutions to this? Thanks in advance (especially for patience as I'm a new "programmer").
I posted a new answer to your original question here How to assign number of repeats to dataframe based on elements of an identifying vector in R?.
That will hopefully help you there and here.