I'm trying to use dplyr
to calculate sunrise time for a set of lon/lat/timestamp coordinates, using the sunriset
function from maptools. Here is a reproducible example.
library(maptools)
library(dplyr)
pts <- tbl_df(data.frame(
lon=c(12.08752,12.08748,12.08754,12.08760,12.08746,12.08748),
lat=c(52.11760,52.11760,52.11747,52.11755,52.11778,52.11753),
timestamp=as.POSIXct(
c("2011-08-12 02:00:56 UTC","2011-08-12 02:20:22 UTC",
"2011-08-12 02:40:15 UTC","2011-08-12 03:00:29 UTC",
"2011-08-12 03:20:26 UTC","2011-08-12 03:40:30 UTC"))
))
pts %>% mutate(sunrise=sunriset(as.matrix(lon,lat),
timestamp,POSIXct.out=T,
direction='sunrise')$time)
When I run this code, I get the error
"Error: invalid subscript type 'closure'"
I'm guessing this means that I'm not passing the variables into sunriset
correctly.
This method does work, if I do it without dplyr
pts$sunrise<-sunriset(as.matrix(select(pts,lon,lat)),
pts$timestamp, POSIXct.out=T,
direction='sunrise')$time
However, I have lots of rows (around 65 million) and the above method is really slow even with a small fraction of that. I'm hoping that dplyr will be faster. If anybody has other suggestions on what method might be fastest, I'd love to hear them.
sunr <- function(lon, lat, ts, dir='sunrise') {
# can also do matrix(c(pts$lon, pts$lat), ncol=2, byrow=TRUE) vs
# as.matrix(data.frame…
sunriset(as.matrix(data.frame(lon, lat)), ts, POSIXct.out=TRUE, direction=dir)$time
}
pts %>% mutate(sunrise = sunr(lon, lat, timestamp))
is one way to handle it (and has the side-effect of cleaner mutate
pipelines) but I'm not sure why you think it will be faster. Either way, the bottleneck is (most likely) the creation of the matrix for the call to sunriset
which is going to happen either way.
The maptools
source is pretty easy to go through and has a non-exported function maptools:::.sunrisetUTC()
that does:
".sunrisetUTC" <- function(jd, lon, lat, direction=c("sunrise", "sunset")) {
## Value: Numeric, UTC time of sunrise or sunset, in minutes from zero
## Z.
## --------------------------------------------------------------------
## Arguments: jd=julian day (real);
## lon=lat=longitude and latitude, respectively, of the observer in
## degrees;
## sunrise=logical indicating whether sunrise or sunset UTC should be
## returned.
You could try doing passing in the julian day, lon, lat & direction to it vs the exported functions to avoid the data copying. However, if performance is critical, I'd use Rcpp
to write an inline, vectorized C/C++ function based on this.