I have 2 large dataframes with over 300,000 observations & 100+ variables at the moment, but for the sake of simplicity, let's assume I have df1:
> str(df1)
'data.frame': 3000 obs. of 3 variables:
$ Name : chr "AAA" "BBB" "CCC" "DDD" ...
$ DateTime : POSIXct, format: "2014-01-01 00:00:00" "2014-01-01 00:10:00" "2014-01-01 00:20:00" ...
$ Age : num 27 25 27 30 ...
df2:
> str(df2)
'data.frame': 3000 obs. of 3 variables:
$ HEX : Factor w/ 500 levels "AAA","BBB",..: 100 100 100 100 ...
$ DateTime : Factor w/ 3000 levels "2014-01-01 00:00:00",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Age : Factor w/ 500 levels "27","25",..: 100 100 100 100 ...
Both dataframes have the same values with same number of columns & rows, except that their structures are different with everything in df2 in factor.
I would like to convert the structure in df2 to be the same as df1. Please advise, thank you in advance
Assuming the columns of both data frames are in exactly the same order as described, you may use the class
function in a Map
approach.
df2[] <- Map(function(x, y) {
if (any(grepl("POS", y)))
ISOdate(as.Date(x), 0, 0, 0)
else if (y == "Date")
as.Date(x)
else
`class<-`(as.character(x), y)
}, df2, lapply(df1, class))
Before
lapply(df1, class)
# $name
# [1] "character"
#
# $date
# [1] "POSIXct" "POSIXt"
#
# $age
# [1] "numeric"
#
# $date2
# [1] "Date"
lapply(df2, class)
# $HEX
# [1] "factor"
#
# $date
# [1] "factor"
#
# $age
# [1] "factor"
#
# $date2
# [1] "factor"
Conversion
df2[] <- Map(function(x, y) {
if (any(grepl("POS", y)))
ISOdate(as.Date(x), 0, 0, 0)
else if (y == "Date")
as.Date(x)
else
`class<-`(as.character(x), y)
}, df2, lapply(df1, class))
After
lapply(df2, class)
# $HEX
# [1] "character"
#
# $date
# [1] "POSIXct" "POSIXt"
#
# $age
# [1] "numeric"
#
# $date2
# [1] "Date"
Data
df1 <- structure(list(name = c("A", "B", "C", "D", "E"), date = structure(c(1577836800,
1580515200, 1583020800, 1585699200, 1588291200), class = c("POSIXct",
"POSIXt")), age = c(30, 27, 25, 28, 23), date2 = structure(c(18262,
18293, 18322, 18353, 18383), class = "Date")), row.names = c(NA,
-5L), class = "data.frame")
df2 <- structure(list(HEX = structure(1:5, .Label = c("A", "B", "C",
"D", "E"), class = "factor"), date = structure(1:5, .Label = c("2020-01-01 01:00:00",
"2020-02-01 01:00:00", "2020-03-01 01:00:00", "2020-04-01 02:00:00",
"2020-05-01 02:00:00"), class = "factor"), age = structure(c(5L,
3L, 2L, 4L, 1L), .Label = c("23", "25", "27", "28", "30"), class = "factor"),
date2 = structure(1:5, .Label = c("2020-01-01", "2020-02-01",
"2020-03-01", "2020-04-01", "2020-05-01"), class = "factor")), row.names = c(NA,
-5L), class = "data.frame")