I am loading a csv file into a data table in R that looks like the below:
Name | Location | Dates.Start | Dates.End | ... |
---|---|---|---|---|
John | Virtual | 12/12/2022 | 13/12/2022 | ... |
Anne | In person | 05/01/2023 | 06/01/2023 | ... |
Ben | Hybrid | 10/01/2023 | 11/01/2023 | ... |
I need to iterate over each row and use it as a list with the following structure:
row_list = list(Name = 'John' ,
Location = 'Virtual',
Dates = data.frame(Start = '12/12/2022',
End = '13/12/2022'),
...
)
I have tried converting each row to a list and I can do it by iterating over all rows and creating the required entities.
for (row in nrow(dt)){
row_list = list(Name = dt$Name[row] ,
Location = dt$Location[row],
Dates = data.frame(Start = dt$Dates.Start[row],
End = dt$Dates.End[row]),
...
)
}
The problem is that I have multiple columns in the original data table and I would prefer a relist command that would automatically identify and merge the columns that need to be stores as a data frame.
If I use the list()
command the columns won't merge in a data frame (i.e. the 'Dates' data frame).
Is there a way to merge the two 'Dates' into one data frame while creating the list without needing to call each value separately?
In this solution, first I prepare a list of columns toDf
that must be bound in a data.frame. Then I get the list of other columns by difference (setdiff
). The two sets then are merged into the Cols
list.
library(purrr)
toDf <- list(Dates = c("Dates.Start", "Dates.End"))
otherCols <- setdiff(names(dt), reduce(toDf, c))
Cols <- c(setNames(otherCols, otherCols), toDf)
Cols
$Name
[1] "Name"
$Location
[1] "Location"
$Dates
[1] "Dates.Start" "Dates.End"
Then by using asplit
the data.frame is split by row and each row is processed according to elements of Cols
. When there is an element containing two or more column names, the corresponding data is merged into a data.frame
asplit(dt,1) |>
map(~{
data <- as.list(.x)
map(Cols, ~if(length(.x) == 1)
data[[.x]]
else
data.frame(data[.x]))
})
##> [[1]]
##> [[1]]$Name
##> [1] "John"
##>
##> [[1]]$Location
##> [1] "Virtual"
##>
##> [[1]]$Dates
##> Dates.Start Dates.End
##> 1 12/12/2022 13/12/2022
##>
##>
##> [[2]]
##> [[2]]$Name
##> [1] "Anne"
##>
##> [[2]]$Location
##> [1] "In person"
##>
##> [[2]]$Dates
##> Dates.Start Dates.End
##> 1 05/01/2023 06/01/2023
##>
##>
##> [[3]]
##> [[3]]$Name
##> [1] "Ben"
##>
##> [[3]]$Location
##> [1] "Hybrid"
##>
##> [[3]]$Dates
##> Dates.Start Dates.End
##> 1 10/01/2023 11/01/2023
An alternative equivalent by column solution can be:
map(Cols, ~if(length(.x) == 1)
dt[[.x]]
else
split(dt[.x], ~ I(1:nrow(dt)))) |>
transpose()