Search code examples
rdatecountdelete-row

Delete next row after every count/list of day; in R


I am finding it difficult to wrap my head around this: In the dataframe below I want to delete the next row after every count/list of, say, Thursday, same for Friday and so on. I would prefer not using a loop since the data is big.

mydata<- read.table(header=TRUE, text=" 

    Date     AAPL.ret  Weekday Thursday

1 2001-01-04 0.000000000 Thursday 1 2 2001-01-04 0.000000000 Thursday 1 3 2001-01-04 -0.025317808 Thursday 1 4 2001-01-04 0.014545711 Thursday 1 5 2001-01-04 0.007194276 Thursday 1 6 2001-01-04 -0.007194276 Thursday 1 7 2001-01-05 -0.0278569545 Friday 0 8 2001-01-05 0.0056338177 Friday 0 9 2001-01-05 0.0037383221 Friday 0 10 2001-01-05 0.0000000000 Friday 0 11 2002-02-25 3.511856e-03 Monday 0 12 2002-02-25 -3.511856e-03 Monday 0 13 2002-02-25 -4.398505e-04 Monday 0 14 2002-02-25 -2.643173e-03 Monday 0 15 2002-02-25 4.401416e-03 Monday 0 16 2002-02-26 9.189066e-03 Tuesday 0 17 2002-02-26 -8.243166e-04 Tuesday 0 18 2002-02-26 9.533751e-03 Tuesday 0 19 2002-02-26 4.527688e-03 Tuesday 0 20 2002-02-26 4.105933e-04 Tuesday 0 ............. 100 2002-03-01 8.717651e-03 Friday 0 101 2002-03-01 1.990115e-02 Friday 0 102 2002-03-01 -1.344387e-03 Friday 0 103 2002-03-01 -1.445373e-02 Friday 0 ") The output I need should be like this:

    Date     AAPL.ret  Weekday Thursday

1 2001-01-04 0.000000000 Thursday 1 2 2001-01-04 0.000000000 Thursday 1 3 2001-01-04 -0.025317808 Thursday 1 4 2001-01-04 0.014545711 Thursday 1 5 2001-01-04 0.007194276 Thursday 1 6 2001-01-04 -0.007194276 Thursday 1 7 2001-01-05 0.0056338177 Friday 0 8 2001-01-05 0.0037383221 Friday 0 9 2001-01-05 0.0000000000 Friday 0 11 2002-02-25 -3.511856e-03 Monday 0 12 2002-02-25 -4.398505e-04 Monday 0 13 2002-02-25 -2.643173e-03 Monday 0 14 2002-02-25 4.401416e-03 Monday 0 15 2002-02-26 -8.243166e-04 Tuesday 0 16 2002-02-26 9.533751e-03 Tuesday 0 17 2002-02-26 4.527688e-03 Tuesday 0 18 2002-02-26 4.105933e-04 Tuesday 0 ............. 100 2002-03-01 1.990115e-02 Friday 0 101 2002-03-01 -1.344387e-03 Friday 0 102 2002-03-01 -1.445373e-02 Friday 0

Thank you in advance. Sorry if I have wrongfully asked the question. This is my first time of asking a question here; I have tried to follow the rules as best as I can; especially how the table should appear.

The codes I have tried, I believe, are really far from the answer I desire. Just counting and subsetting; below. table(ret.df$Weekday=="Thursday") r1<-ret.df[!(ret.df$Weekday=="Thursday"),]

I hope my question less vague now.

A follow up from the previous answer:

removing rows based on condition in ret_1ON

ret_1ON<- ret.df[duplicated(ret.df$Date)|1:nrow(ret.df)==1,]

dim(ret_1ON)

[1] 98734 4

head(ret_1ON)

    Date     AAPL.ret  Weekday Thursday

1 2001-01-04 0.000000000 Thursday 1 2 2001-01-04 0.000000000 Thursday 1 3 2001-01-04 -0.025317808 Thursday 1 4 2001-01-04 0.014545711 Thursday 1 5 2001-01-04 0.007194276 Thursday 1 6 2001-01-04 -0.007194276 Thursday 1

tail(ret_1ON)
        Date      AAPL.ret  Weekday Thursday

99994 2006-01-19 0.0013771520 Thursday 1 99995 2006-01-19 -0.0007321584 Thursday 1 99996 2006-01-19 -0.0029026141 Thursday 1 99997 2006-01-19 -0.0002511616 Thursday 1 99998 2006-01-19 0.0011297309 Thursday 1 99999 2006-01-19 -0.0002509410 Thursday 1

I'm wandering why the last item in tail is not 98734 but rather 99999?

dim(ret.df)

[1] 99999 4 which means the condition was effected, though.


Solution

  • We can do this with data.table

    library(data.table)
    setDT(mydata)[, .SD[(seq_len(.N) != 1)], Date]
    

    if we wanted to keep the first row of the dataset

    setDT(mydata)[, .SD[(seq_len(.N) != 1)|seq_len(.N)==.I[1]], Date]
    

    Or with dplyr

    library(dplyr)
    mydata %>%
          group_by(Date) %>%
          filter(row_number() != 1)
    

    Or using base R, if the 'Date' column is ordered

    mydata[duplicated(mydata$Date),]
    

    or with including the first row

    mydata[duplicated(mydata$Date)|1:nrow(mydata)==1,]