This is regarding data manipulation and cleaning in R.
I have dataset 1:
Date time Range Waterconsumption
1/1/01 0300 31km 2.0liters
2/1/01 0800 30km 1.8liters
3/1/01 0300 33km 1.7liters
4/1/01 0600 32km 1.8liters
5/1/01 0800 28km 1.7liters
6/1/01 0300 35km 1.6liters
7/1/01 0800 31km 1.8liters
And also dataset 2:
Date time heatlost weight
1/1/01 0300 0.27 61.5kg
2/1/01 0800 0.33 62.0kg
5/1/01 0800 0.69 61.7kg
6/1/01 0300 0.15 61.8kg
7/1/01 0800 0.63 62.0kg
As you can see, dataset 2 has lost some dates (from 3/1/01 to 4/1/01).
So how can I combine dataset 1 and 2 using cbind i.e. inserting heatlost and weight behind waterconsumption (dataset1) according to date?
You can use the library dplyr::left_join(df1, df2, "time")
First let's generate some data to work with, reflecting the variables in your project above:
df1 <-
data.frame(
id = c(1:4),
time = c(1:4),
range = floor(runif(4, 28,32)),
watercon = round(runif(4,1.5,1.7),2)
)
df2 <-
data.frame(
id = c(1,4),
time = c(1,4),
heatlost = c(0.25,0.33),
weight = c(62.5,61.4)
)
df2
has some missing values as per your original questions, and when we apply a left_join
these values will be replaced with an NA
.
If you apply left_join
to join by "time" and then keep only the variables you want using select
:
library(dplyr)
left_join(df1, df2, "time") %>%
select(time, range, watercon, heatlost, weight)
You'll get the returning dataframe:
time range watercon heatlost weight
1 30 1.52 0.25 62.5
2 29 1.55 NA NA
3 29 1.51 NA NA
4 30 1.53 0.33 61.4