Search code examples
rperformanceparallel-processingdistanceosrm

How to speed up code that finds shortest driving time between two sets of points using osrm


Say that I have two sets of coordinates, A and B. My goal is, for each element of A, to find the element of B with the shortest driving time (retaining B's index, driving time, and distance). Based on the answer by @Ben in this question (Calculating distance of multiple coordinates in R), I have come up with the code below. My question is how to make this faster.

library(osrm)
library(sf)

apotheke.df <- st_read(system.file("gpkg/apotheke.gpkg", package = "osrm"),
                       quiet = TRUE)

points <- data.frame(
    id = 1:3,
    longitude = c(13.4, 13.5, 13.3),
    latitude = c(52.4, 52.5, 52.3)
)

route <- list()

#overall goal: for each point, find distance to nearest apotheke

#find distance from each of the 3 points to the first apotheke
for (x in 1:nrow(points)) {
    route[[x]] <- osrmRoute(src = c(points$longitude[x], points$latitude[x]),
                            dst = apotheke.df[1,],
                            overview = FALSE, osrm.profile = "car")
    #add index
    route[[x]][3] <- 1
}

#replace if duration is less than the lowest one
for (x in 1:nrow(points)) {
    for(y in 2:nrow(apotheke.df)) {
        temp <- osrmRoute(src = c(points$longitude[x], points$latitude[x]),
                          dst = apotheke.df[y,],
                          overview = FALSE, osrm.profile = "car")
        temp[3] <- y
        
        print(y)
        print(temp)
        if(temp[2] < route[[x]][2])  route[[x]] <- temp
    }
}

do.call(rbind, route)

Result:

     duration distance   
[1,]     3.52     1.84 18
[2,]     2.05     1.00 14
[3,]    17.10    17.76 76

In my actual application, one set of points has about 150 and the other has thousands and thousands. This takes a really long time to do. This leads to my question - how can I speed up the above code?

My best guess is to use parallel processing (though other aspects of the code may be slow), but I am not good at figuring out how to do this. It may be related to these questions: 1) parallelizing API in R (OSRM) and 2) How to apply an osrm function to every row of a dataframe.


Solution

  • Here is a reproducible example elaborating on my comment.

    1. Define sample source and dest data:

    This is just for the purpose of the answer.

    library(osrm)
    
    source <- data.frame(
        id = 1:3,
        longitude = c(13.4, 13.5, 13.3),
        latitude = c(52.4, 52.5, 52.3)
    )
    
    dest <- data.frame(
        id = 4:5,
        longitude = c(13.9, 13.6),
        latitude = c(52.2, 52.4)
    )
    

    2. Calculate distances

    You will save a huge amount of time using osrmTable() rather than looping through your data and calling osrmRoute() each time. This is particularly true if you are using a remote server rather than hosting you own, as each request is travelling over the internet. Using osrmTable() makes one request to the server for the entire table, rather than a request for every pair of coordinates.

    distances <- osrmTable(
        src = source[c("longitude", "latitude")],
        dst = dest[c("longitude", "latitude")],
        osrm.profile = "car"
    )
    
    distances[["durations"]]
    
    #      1    2
    # 1 60.6 25.7
    # 2 71.4 25.4
    # 3 55.7 25.3
    

    Even with just 100 sources coordinates and 100 destinations, R needs to interpret one command, make one network request and do one high level assignment (<-) operation, rather than 10,000 if you iterate over each one individually.

    3. Find the closest destination to each source

    Again it is much quicker if you skip the loop. I am going to use data.table in the example because it's fast. Firstly, make the distance matrix into a data.table and get it into long form using melt():

    library(data.table)
    
    distances_dt <- data.table(
        source = source$id,
        distances[["durations"]],
        key = "source"
    ) |>
        setnames(
            seq_len(nrow(dest)) + 1,
            as.character(dest$id)
        ) |>
        melt(
            id.vars = "source",
            variable.name = "dest",
            value.name = "distance"
        )
    
    #    source   dest distance
    #     <int> <fctr>    <num>
    # 1:      1      4     60.6
    # 2:      2      4     71.4
    # 3:      3      4     55.7
    # 4:      1      5     25.7
    # 5:      2      5     25.4
    # 6:      3      5     25.3
    

    Then we can simply find the minimum distance for each source:

    distances_dt[,
        .(min_distance = min(distance)),
        by = source
    ]
    
    #    source min_distance
    #     <int>        <num>
    # 1:      1         25.7
    # 2:      2         25.4
    # 3:      3         25.3
    

    4. Setting up your own osrm-backend instance

    The only problem I can foresee is that you will get a rate limit error if you use the default R osrm package server, rather than a local one. According to the R osrm package docs:

    The OSRM demo server does not allow large queries (more than 10000 distances or durations).

    If you are not sure whether you are hosting your own server then you probably aren't. You can check the default server in your R session by running getOption("osrm.server"). The default is "https://routing.openstreetmap.de/".

    If you make a request larger than the max-table-size parameter on the server you are using, you will get this response:

    {"code":"TooBig","message":"Too many table coordinates"}
    

    If this is the case, then if practical, you could break up your data into smaller chunks.

    Alternatively, you can run your own osrm-backend instance. If you are going to be calculating distances regularly, or even a lot of distances once, I would recommend doing this. There are two approaches:

    a. Install osrm-backend and change the max-table-size parameter (Linux/Mac)

    This is the tutorial I followed some years ago, which I think is a good way to do it. The instructions are for Ubuntu although only minor changes are required to install on other, popular Linux flavours or MacOS.

    You need to change one line in the tutorial. Rather than running:

    osrm-routed map.xml.osrm
    

    You can run:

    osrm-routed map.xml.osrm --max-table-size 100000
    

    (Or whatever number is sufficiently large.)

    b. Expose the API in a Docker image (Windows/Linux/Mac)

    Alternatively, if you're running Windows, or you're comfortable with Docker, then it's probably easier to use the Docker images. Even if you haven't used Docker before, this is a good set of instructions.

    If you are running an osrm-backend instance in a Docker container, you can change the max-table-size parameter by passing it to docker run as described here, with syntax similar to:

    docker run -t -i -p 5000:5000 -v "${PWD}:/data" osrm/osrm-backend osrm-routed --algorithm mld --max-table-size 10000 /data/berlin-latest.osrm
    

    In either case, once you have set up your own osrm-routed instance, you need to tell the R osrm package to use it. If it is running on port 5000 of your localhost, you can do this with:

    options(osrm.server = "http://127.0.0.1:5000/")
    

    You can see an example of integrating a local osrm-routed instance with the R osrm package here.