Search code examples
rlistdataframematrixcoercion

R - How to preserve data types and titles when converting list of lists to data frame


I'm working with a list receipts of lists. Each entry in receipts contains a list representing a receipt. The structure of a receipt is consistent and looks like this.

> str(receipts[[1]])
List of 6
 $ receipt_type  : chr "SALESPERSON_ACTIVITY"
 $ timestamp     : POSIXct[1:1], format: "2020-01-01 09:29:00"
 $ receipt_number: int 1195
 $ POS           : int 1
 $ KNo           : int 12
 $ shift_number  : int 9

The receipt_number may contain NA values as well.

I'd like to convert this list into a data frame with corresponding columns (receipt_type, timestamp, receipt_number, etc..). Currently I'm using this

receipts_as_df <- as.data.frame(matrix(unlist(receipts), byrow=TRUE, ncol=length(receipts[[1]])))

This puts the data into a data frame. Sadly the unlist removes all information about the types of the data (I think everything is coerced to character). Furthermore the column names get lost as well. Thus, I have a data frame with all the data, but the types and column-names are lost.

I know that I could rename the columns and datatypes manually, but was wondering whether there is a more comfortable way of handling this situation.

Example: Currently the data frame looks like this

> head(receipts_as_df)
                        V1         V2   V3 V4 V5 V6
1     SALESPERSON_ACTIVITY 1577867340 1195  1 12  9
2 CASH_REGISTER_MONITORING 1577867340 <NA>  1 12  9
3      PAYOUT_NOTIFICATION 1577867340 1196  1 12  9
4             TSE_ACTIVITY 1577869080 <NA>  1 12  9
5   BUSINESS_MODE_ACTIVITY 1577869080 <NA>  1 12  9
6             ZERO_RECEIPT 1577869140 1197  1 12  9

Solution

  • Base R, a little more work needed:

    receipts <- replicate(3, list(
      receipt_type   = "SALESPERSON_ACTIVITY",
      timestamp      = as.POSIXct("2020-01-01 09:29:00", tz = "UTC"),
      receipt_number = 1195,
      POS            = 1,
      KNo            = 12,
      shift_number   = 9
    ), simplify = FALSE)
    
    out <- do.call(rbind.data.frame, c(receipts, list(stringsAsFactors = FALSE)))
    out
    #            receipt_type  timestamp receipt_number POS KNo shift_number
    # 2  SALESPERSON_ACTIVITY 1577870940           1195   1  12            9
    # 21 SALESPERSON_ACTIVITY 1577870940           1195   1  12            9
    # 3  SALESPERSON_ACTIVITY 1577870940           1195   1  12            9
    str(out)
    # 'data.frame': 3 obs. of  6 variables:
    #  $ receipt_type  : chr  "SALESPERSON_ACTIVITY" "SALESPERSON_ACTIVITY" "SALESPERSON_ACTIVITY"
    #  $ timestamp     : num  1.58e+09 1.58e+09 1.58e+09
    #  $ receipt_number: num  1195 1195 1195
    #  $ POS           : num  1 1 1
    #  $ KNo           : num  12 12 12
    #  $ shift_number  : num  9 9 9
    out$timestamp <- as.POSIXct(out$timestamp, origin = "1970-01-01")
    out
    #            receipt_type           timestamp receipt_number POS KNo shift_number
    # 2  SALESPERSON_ACTIVITY 2020-01-01 01:29:00           1195   1  12            9
    # 21 SALESPERSON_ACTIVITY 2020-01-01 01:29:00           1195   1  12            9
    # 3  SALESPERSON_ACTIVITY 2020-01-01 01:29:00           1195   1  12            9
    

    dplyr and data.table without extra work needed:

    dplyr::bind_rows(receipts)
    # # A tibble: 3 x 6
    #   receipt_type         timestamp           receipt_number   POS   KNo shift_number
    #   <chr>                <dttm>                       <dbl> <dbl> <dbl>        <dbl>
    # 1 SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195     1    12            9
    # 2 SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195     1    12            9
    # 3 SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195     1    12            9
    data.table::rbindlist(receipts)
    #            receipt_type           timestamp receipt_number POS KNo shift_number
    # 1: SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195   1  12            9
    # 2: SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195   1  12            9
    # 3: SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195   1  12            9