Search code examples
rdataframedatelubridatereadr

Parsing a bi-annual time variable


I wanted to parse a bi-annual time variable from a dataframe when reading it in using read_csv.

The bi-annual time variable has the format Year-B1 and Year-B2. Here is a little example of how my data is looking like and what I want to do.

#example
#creating some tibble to demonstrate how my data looks like: 

library(tidyverse)

tb <- tibble (time = c("2021-B1","2021-B2","2022-B1","2022-B2"),
              country = rep("USA",4),
              value = c(3,5,4,7)
)
> tb
# A tibble: 4 × 3
  time    country value
  <chr>   <chr>   <dbl>
1 2021-B1 USA         3
2 2021-B2 USA         5
3 2022-B1 USA         4
4 2022-B2 USA         7

#saving the tibble as csv file 

write_csv(tb, "tb.csv")

Now my plan is to read the csv file and specify columns; but I don't know how to specify the col_date()

tb_r <- read_csv("tb.csv", col_types = cols( time= col_date(format= "%Y-%B"), 
                                               country = col_character(), 
                                               value = col_double() ))

Ideally I would like to have the time variable be displayed for Year-B1 as Year:Jun and for Year-B2 as Year:Dec.


Solution

  • We can replace :B1 and :B2 with your desired months.

    tb %>% 
      mutate(time_stamp = str_replace_all(time, c(":B1" = ":Jun", ":B2" = ":Dec"))) 
    
    # # A tibble: 4 x 4
    #   time    country value time_stamp
    #   <chr>   <chr>   <dbl> <chr>     
    # 1 2021:B1 USA         3 2021:Jun  
    # 2 2021:B2 USA         5 2021:Dec  
    # 3 2022:B1 USA         4 2022:Jun  
    # 4 2022:B2 USA         7 2022:Dec  
    

    There is no built-in format for the way your data is structured. So you need to post-process it. It can be done in one pipeline though;

    tb_r <- read_csv("tb.csv", col_types = cols( time= col_character(), 
                                                   country = col_character(), 
                                                   value = col_double() )) %>%
              mutate(time = str_replace_all(time, c(":B1" = ":Jun", ":B2" = ":Dec")))