Search code examples
swiftfirebasegoogle-cloud-firestorelocationgeohashing

How to find documents in an x mile radius using geohashes without filtering on client?


So currently I am using geohashes to do location based queries as such (following this stackoverflow post: Finding geohashes of certain length within radius from a point)

public extension CLLocationCoordinate2D {

    func boundingBox(radius: CLLocationDistance) -> (max: CLLocationCoordinate2D, min: CLLocationCoordinate2D) {
        // 0.0000089982311916 ~= 1m
        let offset = 0.0000089982311916 * radius
        let latMax = self.latitude + offset
        let latMin = self.latitude - offset
        
        // 1 degree of longitude = 111km only at equator
        // (gradually shrinks to zero at the poles)
        // So need to take into account latitude too
        let lngOffset = offset * cos(self.latitude * .pi / 180.0)
        let lngMax = self.longitude + lngOffset
        let lngMin = self.longitude - lngOffset
        
        
        let max = CLLocationCoordinate2D(latitude: latMax, longitude: lngMax)
        let min = CLLocationCoordinate2D(latitude: latMin, longitude: lngMin)
        
        return (max, min)
    }
func isWithin(min: CLLocationCoordinate2D, max: CLLocationCoordinate2D) -> Bool {
        return
            self.latitude > min.latitude &&
                self.latitude < max.latitude &&
                self.longitude > min.longitude &&
                self.longitude < max.longitude
    }

}
func getGeohashPrefix(){
        let loc = CLLocationCoordinate2D(latitude: lat!, longitude: long!)
        MBR = loc.boundingBox(radius: 16093.4) //16093.4 meters = 10 miles
        //corners = [NorthWest, SouthWest, SouthEast, NorthEast] in lat n long
        let corners = [CLLocationCoordinate2D(latitude: MBR.0.latitude,longitude: MBR.1.longitude),
                       MBR.1, CLLocationCoordinate2D(latitude: MBR.1.latitude, longitude: MBR.0.longitude),
                        MBR.0]
        var geohashes_of_corners: [String] = []
        for corner in corners {
            geohashes_of_corners.append(corner.geohash(length: 12))
        }
        geohashes_prefix = geohashes_of_corners.longestCommonPrefix()

    }

var query: Query = db.collection("Users").whereField("geohash",isGreaterThanOrEqualTo: geohashes_prefix).whereField("geohash",isLessThanOrEqualTo: geohashes_prefix + "~").order(by: "geohash", descending: false)

       query.getDocuments { (querySnapshot, err) in
           if err != nil{
               print("error getting da documents")
           }else{
                if querySnapshot!.isEmpty{
                    return completion(arr_of_people)
                }
                for document in querySnapshot!.documents {
                    let d = document.data()
                    let isPersonWithin = CLLocationCoordinate2D(latitude: (d["loc"] as! GeoPoint).latitude, longitude: (d["loc"] as! GeoPoint).longitude).isWithin(min: self.MBR.1, max: self.MBR.0)
                  if !isPersonWithin{
                         continue
                 }
 
                    arr_of_people.append([d["firstName"] as! String,  d["lastName"] as! String])
                   }

               return completion(arr_of_people)
           }
       }

As you can see, I am querying for documents with a specific prefix and then filtering those documents AGAIN on the client. Is that safe? If not, what is the workaround? Use cloud functions, different algorithm (suggest one if you have it), or something else?


Solution

  • A query on geohashes returns points within a certain range of geohashes, which are (somewhat) rectangular regions.

    A geoquery on a central point and a distance returns points that are in a circle.

    Since the two shapes are different, your code uses a client-side check to cut off the points that are outside the circle, but inside the rectangles. This is a normal step when performing geoqueries for a max distance around a point when using geohashes.


    Here's an example of this on a map:

    enter image description here

    The green pins are in a 250km circle around San Francisco, which is what I queried for. The red pins are outside of that circle, but within the set of geohash ranges ([["9q0","9qh"],["9nh","9n~"],["9r0","9rh"],["9ph","9p~"]] here) that needed to be queried to be sure we had all points in range.

    As said: this so-called oversampling is inherent to using geohashes to perform point-and-distance queries. In my experience you'll end up reading between 2x and 8x too many documents.

    It might be possible to reduce the number of false positives by querying for more-but-smaller ranges, but I don't know of any geo-libraries for Firestore that do that.

    I mapped the extra cost as a mental model: finding documents within a certain distance from a point costs me 2x to 8x more than a regular document read.


    Moving the operation to Cloud Functions on the server makes no difference on the number of documents that needs to be read, it just changes where they are read. So you can perform the operation on a server to reduce the bandwidth of transferring the documents from the database to the client. But it won't make a difference in the number of documents that need to be read.

    As discussed in comments: performing the query on the server does allow you to remove client-side access from the data, so that you can ensure the application code will never see documents that are not within the requested range. So if you're worried about document access, performing the query in a trusted environment (such as a server you control or Cloud Functions) is a good option.


    To not pay for the extra document reads, consider finding a service that natively supports geoqueries with a pricing model based on the number of results it returns - instead of the number of results it has to consider. Such a service will (very likely) still consider too many points, but if the pricing model matches what you want that might be worth it.