I'm working with a list receipts
of lists. Each entry in receipts
contains a list representing a receipt. The structure of a receipt is consistent and looks like this.
> str(receipts[[1]])
List of 6
$ receipt_type : chr "SALESPERSON_ACTIVITY"
$ timestamp : POSIXct[1:1], format: "2020-01-01 09:29:00"
$ receipt_number: int 1195
$ POS : int 1
$ KNo : int 12
$ shift_number : int 9
The receipt_number
may contain NA
values as well.
I'd like to convert this list into a data frame with corresponding columns (receipt_type
, timestamp
, receipt_number
, etc..). Currently I'm using this
receipts_as_df <- as.data.frame(matrix(unlist(receipts), byrow=TRUE, ncol=length(receipts[[1]])))
This puts the data into a data frame. Sadly the unlist
removes all information about the types of the data (I think everything is coerced to character
). Furthermore the column names get lost as well. Thus, I have a data frame with all the data, but the types and column-names are lost.
I know that I could rename the columns and datatypes manually, but was wondering whether there is a more comfortable way of handling this situation.
Example: Currently the data frame looks like this
> head(receipts_as_df)
V1 V2 V3 V4 V5 V6
1 SALESPERSON_ACTIVITY 1577867340 1195 1 12 9
2 CASH_REGISTER_MONITORING 1577867340 <NA> 1 12 9
3 PAYOUT_NOTIFICATION 1577867340 1196 1 12 9
4 TSE_ACTIVITY 1577869080 <NA> 1 12 9
5 BUSINESS_MODE_ACTIVITY 1577869080 <NA> 1 12 9
6 ZERO_RECEIPT 1577869140 1197 1 12 9
Base R, a little more work needed:
receipts <- replicate(3, list(
receipt_type = "SALESPERSON_ACTIVITY",
timestamp = as.POSIXct("2020-01-01 09:29:00", tz = "UTC"),
receipt_number = 1195,
POS = 1,
KNo = 12,
shift_number = 9
), simplify = FALSE)
out <- do.call(rbind.data.frame, c(receipts, list(stringsAsFactors = FALSE)))
out
# receipt_type timestamp receipt_number POS KNo shift_number
# 2 SALESPERSON_ACTIVITY 1577870940 1195 1 12 9
# 21 SALESPERSON_ACTIVITY 1577870940 1195 1 12 9
# 3 SALESPERSON_ACTIVITY 1577870940 1195 1 12 9
str(out)
# 'data.frame': 3 obs. of 6 variables:
# $ receipt_type : chr "SALESPERSON_ACTIVITY" "SALESPERSON_ACTIVITY" "SALESPERSON_ACTIVITY"
# $ timestamp : num 1.58e+09 1.58e+09 1.58e+09
# $ receipt_number: num 1195 1195 1195
# $ POS : num 1 1 1
# $ KNo : num 12 12 12
# $ shift_number : num 9 9 9
out$timestamp <- as.POSIXct(out$timestamp, origin = "1970-01-01")
out
# receipt_type timestamp receipt_number POS KNo shift_number
# 2 SALESPERSON_ACTIVITY 2020-01-01 01:29:00 1195 1 12 9
# 21 SALESPERSON_ACTIVITY 2020-01-01 01:29:00 1195 1 12 9
# 3 SALESPERSON_ACTIVITY 2020-01-01 01:29:00 1195 1 12 9
dplyr
and data.table
without extra work needed:
dplyr::bind_rows(receipts)
# # A tibble: 3 x 6
# receipt_type timestamp receipt_number POS KNo shift_number
# <chr> <dttm> <dbl> <dbl> <dbl> <dbl>
# 1 SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
# 2 SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
# 3 SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
data.table::rbindlist(receipts)
# receipt_type timestamp receipt_number POS KNo shift_number
# 1: SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
# 2: SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9
# 3: SALESPERSON_ACTIVITY 2020-01-01 09:29:00 1195 1 12 9