Search code examples
raggregatesequencespanel-data

Creating a vector with multiple sequences based on number of IDs' repetitions


I've got a data frame with panel-data, subjects' characteristic through the time. I need create a column with a sequence from 1 to the maximum number of year per every subject. For example, if subject 1 is in the data frame from 2000 to 2005, I need the following sequence: 1,2,3,4,5,6.

Below is a small fraction of my data. The last column (exp) is what I trying to get. Additionally, if you have a look at the first subject (13) you'll see that in 2008 the value of qtty is zero. In this case I need just a NA or a code (0,1, -9999), it doesn't matter which one.

Below the data is what I did to get that vector, but it didn't work.

Any help will be much appreciated.

subject season qtty exp
    13  2000    29  1
    13  2001    29  2
    13  2002    29  3
    13  2003    29  4
    13  2004    29  5
    13  2005    27  6
    13  2006    27  7
    13  2007    27  8
    13  2008    0   NA
    28  2000    18  1
    28  2001    18  2
    28  2002    18  3
    28  2003    18  4
    28  2004    18  5
    28  2005    18  6
    28  2006    18  7
    28  2007    18  8
    28  2008    18  9
    28  2009    20  10
    28  2010    20  11
    28  2011    20  12
    28  2012    20  13
    35  2000    21  1
    35  2001    21  2
    35  2002    21  3
    35  2003    21  4
    35  2004    21  5
    35  2005    21  6
    35  2006    21  7
    35  2007    21  8
    35  2008    21  9
    35  2009    14  10
    35  2010    11  11
    35  2011    11  12
    35  2012    10  13

My code:

numbY<-aggregate(season ~  subject, data = toCountY,length)
colnames(numbY)<-c("subject","inFish")
toCountY$inFish<-numbY$inFish[match(toCountY$subject,numbY$subject)]
numbYbyFisher<-unique(numbY)
seqY<-aggregate(numbYbyFisher$inFish, by=list(numbYbyFisher$subject), function(x)seq(1,x,1))

Solution

  • I am using ddply and I distinguish 2 cases:

    Either you generate a sequence along subjet and you replace by NA where you have qtty is zero

    ddply(dat,.(subjet),transform,new.exp=ifelse(qtty==0,NA,seq_along(subjet)))
    

    Or you generate a sequence along qtty different of zero with a jump where you have qtty is zero

    ddply(dat,.(subjet),transform,new.exp={
      hh <- seq_along(which(qtty !=0))
      if(length(which(qtty ==0))>0) 
        hh <- append(hh,NA,which(qtty==0)-1)
      hh
    })