Search code examples
rlistdataframedatatable

Relist rows of a data table in R following the structure of a desired list?


I am loading a csv file into a data table in R that looks like the below:

Name Location Dates.Start Dates.End ...
John Virtual 12/12/2022 13/12/2022 ...
Anne In person 05/01/2023 06/01/2023 ...
Ben Hybrid 10/01/2023 11/01/2023 ...

I need to iterate over each row and use it as a list with the following structure:

row_list = list(Name = 'John' ,
                Location = 'Virtual',
                Dates = data.frame(Start = '12/12/2022',
                                   End = '13/12/2022'),
                ...
               )

I have tried converting each row to a list and I can do it by iterating over all rows and creating the required entities.

for (row in nrow(dt)){
  row_list = list(Name = dt$Name[row] ,
                Location = dt$Location[row],
                Dates = data.frame(Start = dt$Dates.Start[row],
                                   End = dt$Dates.End[row]),
                ...
               )
}

The problem is that I have multiple columns in the original data table and I would prefer a relist command that would automatically identify and merge the columns that need to be stores as a data frame. If I use the list() command the columns won't merge in a data frame (i.e. the 'Dates' data frame). Is there a way to merge the two 'Dates' into one data frame while creating the list without needing to call each value separately?


Solution

  • In this solution, first I prepare a list of columns toDf that must be bound in a data.frame. Then I get the list of other columns by difference (setdiff). The two sets then are merged into the Cols list.

    library(purrr)
    
    toDf <- list(Dates = c("Dates.Start", "Dates.End"))
    otherCols <- setdiff(names(dt), reduce(toDf, c))
    Cols <- c(setNames(otherCols, otherCols), toDf)
    
    Cols
    
    $Name
    [1] "Name"
    
    $Location
    [1] "Location"
    
    $Dates
    [1] "Dates.Start" "Dates.End"  
    
    

    Then by using asplit the data.frame is split by row and each row is processed according to elements of Cols. When there is an element containing two or more column names, the corresponding data is merged into a data.frame

    asplit(dt,1) |>
        map(~{
            data <- as.list(.x)
            map(Cols, ~if(length(.x) == 1)
                           data[[.x]]
                       else
                           data.frame(data[.x]))
        })
        
    ##> [[1]]
    ##> [[1]]$Name
    ##> [1] "John"
    ##> 
    ##> [[1]]$Location
    ##> [1] "Virtual"
    ##> 
    ##> [[1]]$Dates
    ##>   Dates.Start  Dates.End
    ##> 1  12/12/2022 13/12/2022
    ##> 
    ##> 
    ##> [[2]]
    ##> [[2]]$Name
    ##> [1] "Anne"
    ##> 
    ##> [[2]]$Location
    ##> [1] "In person"
    ##> 
    ##> [[2]]$Dates
    ##>   Dates.Start  Dates.End
    ##> 1  05/01/2023 06/01/2023
    ##> 
    ##> 
    ##> [[3]]
    ##> [[3]]$Name
    ##> [1] "Ben"
    ##> 
    ##> [[3]]$Location
    ##> [1] "Hybrid"
    ##> 
    ##> [[3]]$Dates
    ##>   Dates.Start  Dates.End
    ##> 1  10/01/2023 11/01/2023
    
    

    An alternative equivalent by column solution can be:

    map(Cols, ~if(length(.x) == 1)
                   dt[[.x]]
               else
                   split(dt[.x], ~ I(1:nrow(dt)))) |>
        transpose()