I have to subtract the daily average temperature to each temperature value during the day also taking in account the station it comes from, along 9 months. I already have the average, my data is something like this:
>>> df1
Station Date Temperature
0 Station1 2022-05-1 9:30:00 7,4
1 Station1 2022-05-1 9:45:00 7,45
2 Station1 2022-05-1 10:00:00 8,2
3 Station1 2022-05-1 10:15:00 8,4
4 Station1 2022-05-1 10:30:00 8,9
5 Station1 2022-05-1 9:30:00 7,5
6 Station2 2022-05-1 9:45:00 7,56
7 Station2 2022-05-1 10:00:00 8,4
8 Station2 2022-05-1 10:15:00 8,7
9 Station2 2022-05-1 10:30:00 8,1
10 ...
>>> df2
Station Date AverageTemperaturePerDayAndStation
0 Station1 2022-05-1 8
1 Station1 2022-05-2 8,3
2 Station1 2022-05-3 8,6
3 Station1 2022-05-4 8,4
4 Station1 2022-05-5 7,9
5 Station2 2022-05-1 6
6 Station2 2022-05-2 7,3
7 Station2 2022-05-3 8,6
8 Station2 2022-05-4 7,4
9 Station2 2022-05-5 6,9
10 ...
So I want R to substranct Temperature - AverageTemperaturePerDayAndStation just like this:
>>> df3
Station Date CorrectedTemperature
0 Station1 2022-05-1 9:30:00 7,4 - 8
1 Station1 2022-05-1 9:45:00 7,45 - 8
2 Station1 2022-05-1 10:00:00 8,2 - 8
3 Station1 2022-05-1 10:15:00 8,4 - 8
4 Station1 2022-05-1 10:30:00 8,9 - 8
5 Station1 2022-05-1 9:30:00 7,5 - 8
6 Station2 2022-05-1 9:45:00 7,56 - 6
7 Station2 2022-05-1 10:00:00 8,4 - 6
8 Station2 2022-05-1 10:15:00 8,7 - 6
9 Station2 2022-05-1 10:30:00 8,1 - 6
10 ...
I would recommend the following:
#generate dataframes df1 and df2
df1 <- data.frame(Station =c(rep("Station1",5), rep("Station2",5)),
Date = c("2022-05-1 9:30:00","2022-05-1 9:45:00","2022-05-1 10:00:00","2022-05-1 10:15:00", "2022-05-1 10:30:00",
"2022-05-1 9:30:00","2022-05-1 9:45:00","2022-05-1 10:00:00","2022-05-1 10:15:00", "2022-05-1 10:30:00"),
Temperature = c(7.4, 7.45, 8.2, 8.4, 8.9, 7.5, 7.56, 8.4, 8.7, 8.1))
df2 <- data.frame(Station=c(rep("Station1",5), rep("Station2",5)),
Date = c("2022-05-01","2022-05-02","2022-05-03","2022-05-04","2022-05-05",
"2022-05-01","2022-05-02","2022-05-03","2022-05-04","2022-05-05"),
AverageTemperaturePerDayAndStation =c(8, 8.3, 8.6, 8.4, 7.9, 6, 7.3, 8.6, 7.4, 6.9))
#save Time and date in separate columns
df1$Time <- format(as.POSIXct(df1$Date), format = "%H:%M:%S")
df1$Date <- as.Date(df1$Date)
#change format of Date in df2 with as.Date
df2$Date <- as.Date(df2$Date)
#use left join (i.e. keep all entries from the first dataframe) and join by both Date and Station
df3 <- dplyr::left_join(df1, df2, by =c("Date","Station"))
Then, you can calculate the new column CorrectedTemperature.
df3$CorrectedTemperature <- df3$Temperature - df3$AverageTemperaturePerDayAndStation
Hope this could help.