Search code examples
iosswiftios-multithreading

How to utilize multiple processor cores in iOS to achieve the fastest/shortest computation of a loop accessing shared data?


When running a for-loop over an array in Swift running on iOS, I'd like to utilize the entire CPU for I feel that this should in theory speed up the computation. However my result is the opposite, that in running everything on a single DispatchQueue instead of a multiple of DispatchQueues, actually performs faster. I will provide an example and would like to know why the single thread approach is faster and if possible I could still lower the time needed for calculation by utilizing multiple cpu cores correctly?

For those just wanting to see the code of my intention may skip ahead as this next section only elaborates my intention and approach.

My Intention in the example provided:

I am determining if a line on a map (trace of a car ride, latitude and longitude for every second) is within a certain predefined region, a polygon on the map (latitudes and longitudes that go around the entire region). For this I have a function which calculates if a single point is within the polygon. I am using a for loop in order to iterate over every location in the traced line of the car ride, and with that function check, if the point is within the polygon. If every traced location is within the polygon, the entire traced car ride, took place within a said region.

I am using an iPhone X for development purposes, and which to utilize the entire CPU to hasten this calculation.

My approach:

In the examples provided I have 3 variants that result in the following times needed for calculation (in seconds):

Time elapsed for single thread variant: 6.490409970283508 s.
Time elapsed for multi thread v1 variant: 24.076722025871277 s.
Time elapsed for multi thread v2 variant: 23.922222018241882 s.

The first approach is the simplest, that is not using multiple DispatchQueue.

The second approach makes use of DispatchQueue.concurrentPerform(iterations: Int). I felt as though this might be the optimal solution for my need, as its already implemented and appears to be made for my exact purpose.

The third approach is my own and it schedules roughly equal parts of the array to for-loops running on DispatchQueues based on the number of active CPU cores reported by the OS.

I have also tried variants that make use of inout parameters (call by reference) but to no avail. The times stayed the same, thus I'm not providing more code to clutter the question with.

I am also aware that I could return the function as soon as I find a single point not within the polygon, but that is not part of this question.

My code:

    /**
    Function that calculates wether or not a 
    single coordinate is within a polygon described
    as a pointlist. 
    This function is used by all others to do the work.
    */
    private static func contains(coordinate: CLLocationCoordinate2D, with pointList: [CLLocationCoordinate2D]) -> Bool {
        var isContained = false
        var j = pointList.count - 1
        let lat = coordinate.latitude
        let lon = coordinate.longitude
        for i in 0 ..< pointList.count {

            if (pointList[i].latitude > lat) != (pointList[j].latitude > lat) &&
                (lon < (pointList[j].longitude - pointList[i].longitude) * (lat - pointList[i].latitude) / (pointList[j].latitude - pointList[i].latitude) + pointList[i].longitude) {
                isContained.toggle()
            }
            j = i
        }
        return isContained
    }

///Runs all three variants as are described in the question
    static func testAllVariants(coordinates: [CLLocationCoordinate2D], areInside pointList: [CLLocationCoordinate2D]) {
        var startTime = CFAbsoluteTimeGetCurrent()
        var bool = contains_singleThread(coordinates: coordinates, with: pointList)
        var timeElapsed = CFAbsoluteTimeGetCurrent() - startTime
        print("Time elapsed for single thread variant: \(timeElapsed) s.")

        startTime = CFAbsoluteTimeGetCurrent()
        bool = contains_multiThread_v1(coordinates: coordinates, with: pointList)
        timeElapsed = CFAbsoluteTimeGetCurrent() - startTime
        print("Time elapsed for multi thread v1 variant: \(timeElapsed) s.")

        startTime = CFAbsoluteTimeGetCurrent()
        bool = contains_multiThread_v2(coordinates: coordinates, with: pointList)
        timeElapsed = CFAbsoluteTimeGetCurrent() - startTime
        print("Time elapsed for multi thread v2 variant: \(timeElapsed) s.")
    }

    private static func contains_singleThread(coordinates: [CLLocationCoordinate2D], with pointList: [CLLocationCoordinate2D]) -> Bool {
        var bContainsAllPoints = true
        for coordinate in coordinates {
            if !contains(coordinate: coordinate, with: pointList) {
                bContainsAllPoints = false
            }
        }
        return bContainsAllPoints
    }

    private static func contains_multiThread_v1(coordinates: [CLLocationCoordinate2D], with pointList: [CLLocationCoordinate2D]) -> Bool {
        let numOfCoordinates = coordinates.count
        var booleanArray = Array(repeating: true, count: numOfCoordinates)
        DispatchQueue.concurrentPerform(iterations: numOfCoordinates) { (index) in
            if !contains(coordinate: coordinates[index], with: pointList) {
                booleanArray[index] = false
            }
        }
        return !booleanArray.contains(false)
    }

    private static func contains_multiThread_v2(coordinates: [CLLocationCoordinate2D], with pointList: [CLLocationCoordinate2D]) -> Bool {
        let numOfCoordinates = coordinates.count
        let coreCount = ProcessInfo().activeProcessorCount

        func chunk<T>(array: [T], into size: Int) -> [[T]] {
            return stride(from: 0, to: array.count, by: size).map {
                Array(array[$0 ..< Swift.min($0 + size, array.count)])
            }
        }

        let segments = chunk(array: coordinates, into: numOfCoordinates/coreCount)

        let dg = DispatchGroup()
        for i in 0..<segments.count {
            dg.enter()
        }

        var booleanArray = Array(repeating: true, count: segments.count)

        for (index, segment) in segments.enumerated() {
            DispatchQueue.global(qos: .userInitiated).async {
                for coordinate in segment {
                    if !contains(coordinate: coordinate, with: pointList) {
                        booleanArray[index] = false
                    }
                }
                dg.leave()
            }
        }

        dg.wait()
        return !booleanArray.contains(false)
    }

Example data

I have uploaded two json files for those wishing to have data to run tests with. It is the same input that has resulted in my recorded times.

The traced car ride: Link to json File The region/area: Link to json File


Solution

  • I solved the problem, thanks to the community. This answer encompasses the various results brought forth by the comment section.

    There are two ways, one is to use pointers, which is the more general approach. The other is more specific to my problem and makes use of the GPU to see if a multiple of points is within a predefined polygon. Either way, here are both ways as code speaks more than words ;).

    Make use of pointers (Note: The basic "contains/containsWithPointer" function can be found in the question):

    private static func contains_multiThread(coordinates: [CLLocationCoordinate2D], with pointList: [CLLocationCoordinate2D]) -> Bool {
            let numOfCoordinates = coordinates.count
            var booleanArray = Array(repeating: true, count: numOfCoordinates)
            let coordinatePointer: UnsafeBufferPointer<CLLocationCoordinate2D> = {
                return coordinates.withUnsafeBufferPointer { pointer -> UnsafeBufferPointer<CLLocationCoordinate2D> in
                    return pointer
                }
            }()
            let pointListPointer: UnsafeBufferPointer<CLLocationCoordinate2D> = {
                return pointList.withUnsafeBufferPointer { pointer -> UnsafeBufferPointer<CLLocationCoordinate2D> in
                    return pointer
                }
            }()
            let booleanPointer: UnsafeMutableBufferPointer<Bool> = {
                return booleanArray.withUnsafeMutableBufferPointer { pointer -> UnsafeMutableBufferPointer<Bool> in
                    return pointer
                }
            }()
    
            DispatchQueue.concurrentPerform(iterations: numOfCoordinates) { (index) in
                if !containsWithPointer(coordinate: coordinatePointer[index], with: pointListPointer) {
                    booleanPointer[index] = false
                }
            }
    
        return !booleanArray.contains(false)
    }
    

    Make use of GPU:

    private static func contains_gpu(coordinates: [CLLocationCoordinate2D], with pointList: [CLLocationCoordinate2D]) -> Bool {
            let regionPoints = pointList.compactMap {CGPoint(x: $0.latitude, y: $0.longitude)}
            let trackPoints = coordinates.compactMap {CGPoint(x: $0.latitude, y: $0.longitude)}
    
            let path = CGMutablePath()
            path.addLines(between: regionPoints)
            path.closeSubpath()
    
            var flag = true
            for point in trackPoints {
                if !path.contains(point) {
                    flag = false
                }
            }
    
            return flag
        }
    

    Which function is faster depends on the system, count of points and complexity of the polygon. My results are that the multithread variant is roughly 30% faster but when the polygon is fairly simple or the point count goes into the millions the gap closes and eventually the gpu variant becomes faster. Who knows, you might get even better results combining the two for this specific problem.