Search code examples
rcbind

How do we arrange two sets of data according to a specific criteria, eg date, in R


This is regarding data manipulation and cleaning in R.


I have dataset 1:

Date   time Range Waterconsumption    
1/1/01 0300 31km  2.0liters
2/1/01 0800 30km  1.8liters
3/1/01 0300 33km  1.7liters
4/1/01 0600 32km  1.8liters
5/1/01 0800 28km  1.7liters
6/1/01 0300 35km  1.6liters
7/1/01 0800 31km  1.8liters

And also dataset 2:

Date   time heatlost weight    
1/1/01 0300 0.27     61.5kg
2/1/01 0800 0.33     62.0kg
5/1/01 0800 0.69     61.7kg
6/1/01 0300 0.15     61.8kg
7/1/01 0800 0.63     62.0kg

As you can see, dataset 2 has lost some dates (from 3/1/01 to 4/1/01).

So how can I combine dataset 1 and 2 using cbind i.e. inserting heatlost and weight behind waterconsumption (dataset1) according to date?


Solution

  • You can use the library dplyr::left_join(df1, df2, "time")

    First let's generate some data to work with, reflecting the variables in your project above:

    df1 <- 
      data.frame(
        id = c(1:4),
        time = c(1:4), 
        range = floor(runif(4, 28,32)),
        watercon = round(runif(4,1.5,1.7),2)
      )
    
    df2 <- 
      data.frame(
        id = c(1,4),
        time = c(1,4), 
        heatlost = c(0.25,0.33),
        weight = c(62.5,61.4)
      )
    

    df2 has some missing values as per your original questions, and when we apply a left_join these values will be replaced with an NA.

    If you apply left_join to join by "time" and then keep only the variables you want using select:

    library(dplyr)
    left_join(df1, df2, "time") %>% 
      select(time, range, watercon, heatlost, weight)
    

    You'll get the returning dataframe:

    time       range   watercon heatlost    weight
        1          30      1.52     0.25      62.5
        2          29      1.55       NA        NA
        3          29      1.51       NA        NA
        4          30      1.53     0.33       61.4