Search code examples
rsubset

Make subset with specific values of a column with grep


I have the following data set:

   usd     year
1  65.09   1997
2  69.28   1998
3  71.18   1999Q1
4  72.12   1999Q2
5  70.68   1999Q3
6  71.01   1999Q4
7  71.45   2000Q1
8  72.02   2000Q2
9  72.29   2000Q3
10 71.12   2000Q4

I want to have the means of every year:

    usd    year
1  65.09   1997
2  69.28   1998
3  71.24   1999
7  71.72   2000

I know how I can do it if I only have years without the quarter. Is there a way to extract the years? Maybe with grep?


Solution

  • I have found a solution using the stringr package:

    mydata <- data.frame(usd = c(65.09,69.28,71.18,72.12,70.68,71.01,71.45,72.02,72.29,71.12),
                         year = c("1997","1998","1999Q1","1999Q2","1999Q3","1999Q4",
                                  "2000Q1","2000Q2","2000Q3","2000Q4"))
    
    library(stringr)
    mydata$year <- str_extract(mydata$year, "[[:digit:]]{4}")
    mydata <- aggregate(usd ~ year, mydata, mean)
    mydata
    
      year     usd
    1 1997 65.0900
    2 1998 69.2800
    3 1999 71.2475
    4 2000 71.7200