Search code examples
rdatasetlapply

Is it possible to update multiple datasets using lapply in R?


I am currently trying to update multiple datasets by adding a new column to each of them.

I did read the solution on this question. However running

lapply(list(annual_2022_v2, bottom_2022_v2, q1_2022_v2, q2_2022_v2, q3_2022_v2, q4_2022_v2, top_2022_v2), transform, start_hour = hour(started_at))

only printed the correct output, but didn't update or added the new column to my original datasets.

To test it on an individual dataset I did,

lapply(list(q1_2022_v2), transform, start_hour = hour(started_at)).

Although it did print the correct dataset with the new column, it didn't update it.

I am trying to figure out the "optimal" way to be able to write some sort of loop, rather than hard-coding 8 different datasets, such as

q1_2022_v2$start_hour <- hour(q1_2022_v2$started_at)
q2_2022_v2$start_hour <- hour(q2_2022_v2$started_at)
q3_2022_v2$start_hour <- hour(q3_2022_v2$started_at)
q4_2022_v2$start_hour <- hour(q4_2022_v2$started_at)

I also see solutions using Map() and cbind(), but I am confused on how they work.


I eventually decided not to complicate things and just work with one dataset.


Solution

  • If you don't assign it, lapply's return value is lost. lapply is not a for loop, it does functional programming. What you see printed is its return value.

    Start with putting these datasets into a list. I strongly suspect they all have the same structure, which means they should have never been separate, i.e. put them into the list when they are created/imported.

    all_2022_v2 <- mget(ls(pattern = glob2rx("*_2022_v2")))
    
    all_2022_v2 <- lapply(all_2022_v2, transform, start_hour = hour(started_at))
    

    You should probably rbind the four datasets and have q as a grouping column.