r dataframe tidyr data-manipulation data-mining

Make a new column based on a condition in existing column in R?

I have weather recorded at daily intervals. I need to make new columns based on existing columns mean_rh, mean_temp and mean_ws. For each year, I need to calculate the exposure duration for each unique combination of temperature/humidity/wind speed values values. For instance, I would like to know how many days a mean_ws of 0.06 was recorded in 2006?

Example raw data

I can say that 0.06 was recorded for only one day.

Here is the reproducible example for the above data

library(dplyr)

# create example data frame
df <- data.frame(
  year = c(2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006),
  date = c("11/9/06", "12/9/06", "13/9/06", "14/9/06", "15/9/06", "16/9/06", "17/9/06", "18/9/06", "19/9/06", "20/9/06", "21/9/06", "22/9/06", "23/9/06", "24/9/06", "25/9/06", "26/9/06", "27/9/06", "28/9/06", "29/9/06", "30/9/06"),
  mean_rh = c(69.49, 69.05, 68.47, 65.99, 65.53, 67.07, 72.42, 91.79, 90.13, 76.88, 67.68, 61.11, 70.19, 66.88, 76.63, 89.09, 84.97, 80.08, 79.82, 82.81),
  mean_temp = c(18.05, 20.23, 20.93, 20.58, 18.64, 18.74, 19.13, 15.98, 14.62, 14.06, 17.03, 19.76, 18.64, 18.27, 16.96, 16.31, 14.97, 15.69, 16.34, 16.02),
  mean_ws = c(0.84, 0.33, 0.46, 0.79, 2.25, 1.95, 0.13, 0.06, 1.10, 0.90, 1.10, 1.20, 0.33, 0.36, 0.27, 0.66, 0.22, 0.62, 0.33, 0.18)
)

# convert the date column to a date format
df$date <- as.Date(df$date, format = "%d/%m/%y")

Taking temperature as example, I want something like this (a sum of number of days for each temperature in each year)

I am using the following code, but my calculations are not correct, when I match them with the raw data. My code needs fixing.

# calculate the exposure duration for each unique `mean_temp` value
df2 <- df %>%
  group_by(year, mean_temp) %>%
  summarise(exposure_duration = difftime(max(date), min(date), units = "days")) %>%
  ungroup()

# spread the data to have separate columns for each unique `mean_temp` value
df3 <- df2 %>%
  spread(key = mean_temp, value = exposure_duration, fill = 0)

Above code is just for mean_temp, but I would like to group_by mean_ws and mean_rh too. Thank you!

Solution

Task as I understand: I would like to know how many days a mean_ws of 0.06 was recorded in 2006?

This can be done with xtabs. As your data has only data for 2006, I have changed the first years to 2005.

df <- data.frame(
  year = c(
    2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006),
  date = c("11/9/06", "12/9/06", "13/9/06", "14/9/06", "15/9/06", "16/9/06", "17/9/06", "18/9/06", "19/9/06", "20/9/06", "21/9/06", "22/9/06", "23/9/06", "24/9/06", "25/9/06", "26/9/06", "27/9/06", "28/9/06", "29/9/06", "30/9/06"),
  mean_rh = c(69.49, 69.05, 68.47, 65.99, 65.53, 67.07, 72.42, 91.79, 90.13, 76.88, 67.68, 61.11, 70.19, 66.88, 76.63, 89.09, 84.97, 80.08, 79.82, 82.81),
  mean_temp = c(18.05, 20.23, 20.93, 20.58, 18.64, 18.74, 19.13, 15.98, 14.62, 14.06, 17.03, 19.76, 18.64, 18.27, 16.96, 16.31, 14.97, 15.69, 16.34, 16.02),
  mean_ws = c(0.84, 0.33, 0.46, 0.79, 2.25, 1.95, 0.13, 0.06, 1.10, 0.90, 1.10, 1.20, 0.33, 0.36, 0.27, 0.66, 0.22, 0.62, 0.33, 0.18)
)

# convert the date column to a date format
df$date <- as.Date(df$date, format = "%d/%m/%y")


xtabs(~year + mean_temp, data = df)
#>       mean_temp
#> year   14.06 14.62 14.97 15.69 15.98 16.02 16.31 16.34 16.96 17.03 18.05 18.27
#>   2005     0     0     0     0     1     0     0     0     0     0     1     0
#>   2006     1     1     1     1     0     1     1     1     1     1     0     1
#>       mean_temp
#> year   18.64 18.74 19.13 19.76 20.23 20.58 20.93
#>   2005     1     1     1     0     1     1     1
#>   2006     1     0     0     1     0     0     0