Search code examples
rdplyr

Create a for loop of average temperatures from a list (R)?


I have a list ("x") of files that have temperatures every ten minutes. Each ID is a separate entry in the list and has a group of associated temperatures. Here's a sample:

head(x)


$165.212

ID       Date     Time            DateTime       Temp
165.212 2023-07-18 10:31:00 2023-07-18 10:31:00 20.37
164.212 2023-07-18 10:41:00 2023-07-18 11:35:00 23.4
164.212 2023-07-18 10:51:00 2023-07-18 11:35:00 23.8
Lux      Fx Sex Sp   TimeOfDay                Hour
3060 164.212   M WT 10.58416667 2024-07-18 11:00:00
1287 164.212   M WT 10.75083333 2024-07-18 11:00:00
1128 164.212   M WT 10.91750000 2024-07-18 11:00:00


$164.314

ID       Date     Time            DateTime       Temp
164.314 2023-07-18 10:31:00 2023-07-18 11:35:00 32.5
164.314 2023-07-18 10:41:00 2023-07-18 11:35:00 33.2
164.314 2023-07-18 10:51:00 2023-07-18 11:35:00 22.8
Lux      Fx Sex Sp   TimeOfDay                Hour
3060 164.314   M WT 10.58416667 2024-07-18 11:00:00
1287 164.314   M WT 10.75083333 2024-07-18 11:00:00
2700 164.314   M WT 10.91750000 2024-07-18 11:00:00

I want to create a for loop that takes each entry in the list and averages hourly and daily temperatures for that ID number, preferably a new list with the ID, average hourly temperatures and date/ time.

I have created a column ("hour") that rounds the hour to the nearest hour, and I would like to aggregate the data into groups based on these values.

Here is the code I'm using but I can't figure out why it's not working. I know I need to specify that I'm making a new list.

for (each in x) {
  hour_t$ = x %>%
    group_by(Hour) %>%
    summarise(AvgTemperature = mean(Temp, na.rm = TRUE))
}

Example data (best guess):

list(`165.212` = structure(list(ID = c("165.212", "164.212", 
"164.212"), Date = structure(c(19556, 19556, 19556), class = "Date"), 
    Time = c("10:31:00", "10:41:00", "10:51:00"), DateTime = structure(c(1689676260, 
    1689680100, 1689680100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    Temp = c(20.37, 23.4, 23.8), Lux = c(3060L, 1287L, 1128L), 
    Fx = c(164.212, 164.212, 164.212), Sex = c("M", "M", "M"), 
    Sp = c("WT", "WT", "WT"), TimeOfDay = c("10.58416667 2024-07-18", 
    "10.75083333 2024-07-18", "10.91750000 2024-07-18"), Hour = c("11:00:00", 
    "11:00:00", "11:00:00")), row.names = c(NA, -3L), class = "data.frame"), 
    `164.314` = structure(list(ID = c("164.314", "164.314", "164.314"
    ), Date = structure(c(19556, 19556, 19556), class = "Date"), 
        Time = c("10:31:00", "10:41:00", "10:51:00"), DateTime = structure(c(1689680100, 
        1689680100, 1689680100), class = c("POSIXct", "POSIXt"
        ), tzone = "UTC"), Temp = c(32.5, 33.2, 22.8), Lux = c(3060L, 
        1287L, 2700L), Fx = c(164.314, 164.314, 164.314), Sex = c("M", 
        "M", "M"), Sp = c("WT", "WT", "WT"), TimeOfDay = c("10.58416667 2024-07-18", 
        "10.75083333 2024-07-18", "10.91750000 2024-07-18"), 
        Hour = c("11:00:00", "11:00:00", "11:00:00")), row.names = c(NA, 
    -3L), class = "data.frame"))

Solution

  • This can be easily done in a single-line. Use aggregate from base and loop over your list with lapply:

    > lapply(data_list, aggregate, Temp~ID+Date+Hour, mean)
    $`165.212`
            Date     Hour      ID  Temp
    1 2023-07-18 11:00:00 164.212 23.60
    2 2023-07-18 11:00:00 165.212 20.37
    
    $`164.314`
            Date     Hour      ID Temp
    1 2023-07-18 11:00:00 164.314 29.5 
    

    and do

    lapply(unname(data_list), aggregate, Temp ~ Date + Hour + ID, mean) |> 
     do.call(what=rbind) |> split(f=~ID)
    

    if you like to separate by ID.