Search code examples
rstringdataframestructure

R: Converting structure of a dataframe into the same structure of another dataframe


I have 2 large dataframes with over 300,000 observations & 100+ variables at the moment, but for the sake of simplicity, let's assume I have df1:

> str(df1)
'data.frame':   3000 obs. of  3 variables:
 $ Name         : chr  "AAA" "BBB" "CCC" "DDD" ...
 $ DateTime     : POSIXct, format: "2014-01-01 00:00:00" "2014-01-01 00:10:00" "2014-01-01 00:20:00" ...
 $ Age          : num  27 25 27 30 ...

df2:

> str(df2)
'data.frame':   3000 obs. of  3 variables:
 $ HEX          : Factor w/ 500 levels "AAA","BBB",..: 100 100 100 100 ...
 $ DateTime     : Factor w/ 3000 levels "2014-01-01 00:00:00",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Age          : Factor w/ 500 levels "27","25",..: 100 100 100 100 ...

Both dataframes have the same values with same number of columns & rows, except that their structures are different with everything in df2 in factor.

I would like to convert the structure in df2 to be the same as df1. Please advise, thank you in advance


Solution

  • Assuming the columns of both data frames are in exactly the same order as described, you may use the class function in a Map approach.

    df2[] <- Map(function(x, y) {
      if (any(grepl("POS", y)))
        ISOdate(as.Date(x), 0, 0, 0)
      else if (y == "Date")
        as.Date(x)
      else
        `class<-`(as.character(x), y)
      }, df2, lapply(df1, class))
    

    Demonstration

    Before

    lapply(df1, class)
    # $name
    # [1] "character"
    # 
    # $date
    # [1] "POSIXct" "POSIXt" 
    # 
    # $age
    # [1] "numeric"
    # 
    # $date2
    # [1] "Date"
    
    lapply(df2, class)
    # $HEX
    # [1] "factor"
    # 
    # $date
    # [1] "factor"
    # 
    # $age
    # [1] "factor"
    # 
    # $date2
    # [1] "factor"
    

    Conversion

    df2[] <- Map(function(x, y) {
      if (any(grepl("POS", y)))
        ISOdate(as.Date(x), 0, 0, 0)
      else if (y == "Date")
        as.Date(x)
      else
        `class<-`(as.character(x), y)
      }, df2, lapply(df1, class))
    

    After

    lapply(df2, class)
    # $HEX
    # [1] "character"
    # 
    # $date
    # [1] "POSIXct" "POSIXt" 
    # 
    # $age
    # [1] "numeric"
    # 
    # $date2
    # [1] "Date"
    

    Data

    df1 <- structure(list(name = c("A", "B", "C", "D", "E"), date = structure(c(1577836800, 
    1580515200, 1583020800, 1585699200, 1588291200), class = c("POSIXct", 
    "POSIXt")), age = c(30, 27, 25, 28, 23), date2 = structure(c(18262, 
    18293, 18322, 18353, 18383), class = "Date")), row.names = c(NA, 
    -5L), class = "data.frame")
    
    df2 <- structure(list(HEX = structure(1:5, .Label = c("A", "B", "C", 
    "D", "E"), class = "factor"), date = structure(1:5, .Label = c("2020-01-01 01:00:00", 
    "2020-02-01 01:00:00", "2020-03-01 01:00:00", "2020-04-01 02:00:00", 
    "2020-05-01 02:00:00"), class = "factor"), age = structure(c(5L, 
    3L, 2L, 4L, 1L), .Label = c("23", "25", "27", "28", "30"), class = "factor"), 
        date2 = structure(1:5, .Label = c("2020-01-01", "2020-02-01", 
        "2020-03-01", "2020-04-01", "2020-05-01"), class = "factor")), row.names = c(NA, 
    -5L), class = "data.frame")