I am new to purrr and struggling to understand how to append the result of my function onto my dataframe (and get the best performance, since my dataframe is large).
I'm attempting to calculate sunrise time for each row in a dataframe:
library(tidyverse)
library(StreamMetabolism)
test <- structure(list(Latitude = c(44.49845, 42.95268, 42.95268, 44.49845,
44.49845, 44.49845), Longitude = c(-78.19259, -81.36935, -81.36935, -78.19259,
-78.19259, -78.19259), date = c("2014/02/12", "2014/01/24", "2014/01/08",
"2014/01/11", "2014/01/10", "2014/01/07"), timezone = c("EST5EDT", "EST5EDT",
"EST5EDT", "EST5EDT", "EST5EDT", "EST5EDT")), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -6L))
sunRise <- function(Latitude, Longitude, date, timezone){
print(sunrise.set(Latitude, Longitude, date, timezone, num.days = 1)[1,1])
}
I got this far, which gets me the desired sunrise times:
test %>%
pwalk(sunRise)
[1] "2014-02-12 07:17:09 EST"
[1] "2014-01-24 07:47:55 EST"
[1] "2014-01-08 07:56:13 EST"
[1] "2014-01-11 07:47:38 EST"
[1] "2014-01-10 07:47:59 EST"
[1] "2014-01-07 07:48:48 EST"
But I can't seem to figure out how to get the results of my function appended on to the end of the "test" dataframe, say as another variable called "sunrise_time"...
test %>%
mutate(sunrisetime = pwalk(sunRise))
Error in mutate_impl(.data, dots) : Evaluation error: argument ".f" is missing, with no default.
Sidebar: if you can recommend a good purrr tutorial that worked for you, please include it in your answer!! There seems to be a lot to know about purrr and I'm not sure what to focus on as a first-timer.
You don't really need purrr
here. Here's a dplyr
approach:
library(dplyr)
library(StreamMetabolism)
# updated function
sunRise <- function(Latitude, Longitude, date, timezone){
sunrise.set(Latitude, Longitude, date, timezone, num.days = 1)[1,1]
}
test %>%
rowwise() %>%
mutate(sunrize_time = sunRise(Latitude, Longitude, date, timezone)) %>%
ungroup()
# # A tibble: 6 x 5
# Latitude Longitude date timezone sunrize_time
# <dbl> <dbl> <chr> <chr> <dttm>
# 1 44.5 -78.2 2014/02/12 EST5EDT 2014-02-12 07:17:09
# 2 43.0 -81.4 2014/01/24 EST5EDT 2014-01-24 07:47:55
# 3 43.0 -81.4 2014/01/08 EST5EDT 2014-01-08 07:56:13
# 4 44.5 -78.2 2014/01/11 EST5EDT 2014-01-11 07:47:38
# 5 44.5 -78.2 2014/01/10 EST5EDT 2014-01-10 07:47:59
# 6 44.5 -78.2 2014/01/07 EST5EDT 2014-01-07 07:48:48
Or if you want to use purr
you can do:
library(tidyverse)
test %>%
group_by(id = row_number()) %>%
nest() %>%
mutate(sunrise_time = map(data, ~sunRise(.x$Latitude, .x$Longitude, .x$date, .x$timezone))) %>%
unnest()
# # A tibble: 6 x 6
# id sunrise_time Latitude Longitude date timezone
# <int> <dttm> <dbl> <dbl> <chr> <chr>
# 1 1 2014-02-12 07:17:09 44.5 -78.2 2014/02/12 EST5EDT
# 2 2 2014-01-24 07:47:55 43.0 -81.4 2014/01/24 EST5EDT
# 3 3 2014-01-08 07:56:13 43.0 -81.4 2014/01/08 EST5EDT
# 4 4 2014-01-11 07:47:38 44.5 -78.2 2014/01/11 EST5EDT
# 5 5 2014-01-10 07:47:59 44.5 -78.2 2014/01/10 EST5EDT
# 6 6 2014-01-07 07:48:48 44.5 -78.2 2014/01/07 EST5EDT
You can remove the id
column if you want.
Or, you can slightly change your function and do this:
# update function
sunRise <- function(Latitude, Longitude, date, timezone){
return(list(sunrise_time = sunrise.set(Latitude, Longitude, date, timezone, num.days = 1)[1,1]))
}
# apply function to each row and create a dataframe
# bind columns with original dataset
pmap_df(test, sunRise) %>%
cbind(test, .)
# Latitude Longitude date timezone sunrise_time
# 1 44.49845 -78.19259 2014/02/12 EST5EDT 2014-02-12 07:17:09
# 2 42.95268 -81.36935 2014/01/24 EST5EDT 2014-01-24 07:47:55
# 3 42.95268 -81.36935 2014/01/08 EST5EDT 2014-01-08 07:56:13
# 4 44.49845 -78.19259 2014/01/11 EST5EDT 2014-01-11 07:47:38
# 5 44.49845 -78.19259 2014/01/10 EST5EDT 2014-01-10 07:47:59
# 6 44.49845 -78.19259 2014/01/07 EST5EDT 2014-01-07 07:48:48