Search code examples
rdplyrr-maptools

use maptools::sunriset() inside mutate


I'm trying to use dplyr to calculate sunrise time for a set of lon/lat/timestamp coordinates, using the sunriset function from maptools. Here is a reproducible example.

library(maptools)
library(dplyr)

pts <- tbl_df(data.frame(
  lon=c(12.08752,12.08748,12.08754,12.08760,12.08746,12.08748),
  lat=c(52.11760,52.11760,52.11747,52.11755,52.11778,52.11753),
  timestamp=as.POSIXct(
    c("2011-08-12 02:00:56 UTC","2011-08-12 02:20:22 UTC",
      "2011-08-12 02:40:15 UTC","2011-08-12 03:00:29 UTC",
      "2011-08-12 03:20:26 UTC","2011-08-12 03:40:30 UTC"))
))

pts %>% mutate(sunrise=sunriset(as.matrix(lon,lat),
                                timestamp,POSIXct.out=T,
                                direction='sunrise')$time)

When I run this code, I get the error

"Error: invalid subscript type 'closure'"

I'm guessing this means that I'm not passing the variables into sunriset correctly.

This method does work, if I do it without dplyr

pts$sunrise<-sunriset(as.matrix(select(pts,lon,lat)), 
                    pts$timestamp, POSIXct.out=T, 
                    direction='sunrise')$time

However, I have lots of rows (around 65 million) and the above method is really slow even with a small fraction of that. I'm hoping that dplyr will be faster. If anybody has other suggestions on what method might be fastest, I'd love to hear them.


Solution

  • sunr <- function(lon, lat, ts, dir='sunrise') {
      # can also do matrix(c(pts$lon, pts$lat), ncol=2, byrow=TRUE) vs 
      # as.matrix(data.frame…
      sunriset(as.matrix(data.frame(lon, lat)), ts, POSIXct.out=TRUE, direction=dir)$time
    }
    
    pts %>% mutate(sunrise = sunr(lon, lat, timestamp))
    

    is one way to handle it (and has the side-effect of cleaner mutate pipelines) but I'm not sure why you think it will be faster. Either way, the bottleneck is (most likely) the creation of the matrix for the call to sunriset which is going to happen either way.

    The maptools source is pretty easy to go through and has a non-exported function maptools:::.sunrisetUTC() that does:

    ".sunrisetUTC" <- function(jd, lon, lat, direction=c("sunrise", "sunset")) {
    ## Value: Numeric, UTC time of sunrise or sunset, in minutes from zero
    ## Z.
    ## --------------------------------------------------------------------
    ## Arguments: jd=julian day (real);
    ## lon=lat=longitude and latitude, respectively, of the observer in
    ## degrees;
    ## sunrise=logical indicating whether sunrise or sunset UTC should be
    ## returned.
    

    You could try doing passing in the julian day, lon, lat & direction to it vs the exported functions to avoid the data copying. However, if performance is critical, I'd use Rcpp to write an inline, vectorized C/C++ function based on this.