Search code examples
rdatasetsas-macro

How to Make Function For Multiple Datasets in R like macro in SAS


I have data from Jan 2021 - Feb 2022 (the names data like CARD_202101, CARD_202102, CARD_202103 until CARD_202202) each data contain variables:

  • CIF
  • Date
  • Descriptions

How can I make function in R. so I just can have CIF and Date variables for all datasets from Jan 2021 - Feb2022

example:

CARD_202101 <- data.frame(CIF = c(1,2,3), Date = c('2021-01-01', '2021-01- 
               02','2021-01-01'), Descriptions = c("a", "b", "c"))
CARD_202102 <- data.frame(CIF = c(1,6,3), Date = c('2021-02-01', '2021-02- 
               02','2021-01-01'), Descriptions = c("a", "b", "c"))
....


CARD_202202 <- data.frame(CIF = c(4,2,3), Date = c('2022-02-01', '2022-02- 
               02','2022-02-01'), Descriptions = c("a", "b", "c"))

I just want each dataset just only contain CIF and Date Variables, like

CARD_202101 <- data.frame(CIF = c(1,2,3), Date = c('2021-01-01', '2021-01- 
               02','2021-01-01'))

CARD_202102 <- data.frame(CIF = c(1,6,3), Date = c('2021-02-01', '2021-02- 
               02','2021-01-01'))
....


CARD_202202 <- data.frame(CIF = c(4,2,3), Date = c('2022-02-01', '2022-02- 
               02','2022-02-01'))

I need looping through all dataset

Solution

  • I will asume few things first:

    1. naming convention of data is like: it starts with CARD_ and followed by 6 digits
    2. I can use package
    3. objects are in global environment

    If so i recomend loop through all the object that meet naming convention and bind them using data.table::rbindlist like this:

    bind_datasets <- function()
    {
        data.table::rbindlist(
            l = lapply(
                X = ls(envir = globalenv(), pattern = "^(CARD_)\\d{6}$"),
                FUN = function(i)
                {
                    res <- get(x = i, envir = globalenv())
                    res <- subset(x = res, select = c("CIF", "Date"))
                    return(res)
                }
            )
        )    
    }
    

    This function:

    1. search global environment for object named with established pattern
    2. for each object retrieves CIF and Date column
    3. bind object one after another.

    EDIT:

    After your comment I think the answer is the code below:

    # find all the data in global environment which comply with a pattern
    datasets <- ls(envir = globalenv(), pattern = "^(CARD_)\\d{6}$")
    # loop through it and each time assign to them their subset
    for (dst in datasets)
    {
        res <- get(x = dst, envir = globalenv())
        assign(
            x = dst, 
            value = subset(x = res, select = c("CIF", "Date")), 
            envir = globalenv()
        )
    }