Search code examples
swiftperformanceswap

swapAt is faster than manual swap operation


swapAt() is faster than (nums[i], nums[j]) = (nums[j], nums[i]) operation, I wonder why?

The manual swap solution takes 44ms, but the swapAt one takes only 40ms. I wonder what is the difference?

The manual swap solution (44ms)

func moveZeroes(_ nums: inout [Int]) {
        var j = 0;
        for i in 0..<nums.count {
            if (nums[i] != 0) {
                (nums[i], nums[j]) = (nums[j], nums[i]) 
                j += 1
            }
        }
 }

The swapAt solution (40ms)

 func moveZeroes(_ nums: inout [Int]) {
        var j = 0;
        for i in 0..<nums.count {
            if (nums[i] != 0) {
                nums.swapAt(i, j) 
                j += 1
            }
        }
 }

Solution

  • First, a few general observations on benchmarking:

    • Use optimized builds (which I gather you’re doing, but I only mention for the sake of completeness)
    • Repeat the tests many times
    • Randomize the order that you perform the benchmarks and compare a few runs. (I personally use unit tests with the “Randomize Execution Order” feature turned on.)

    Second, I should say that the difference between the tuple and swapAt renditions was very minor. I looped through a billion times (array of 1000 items, repeated a million times) and the difference was still a fraction of a second.

    Third, I found that using tuples for the swapping was slightly faster than using swapAt when dealing with a large randomized data set where 10% of the data points were zeros, but a little slower when very few swaps were really needed.

    This latter point makes sense because swapAt will skip the swap operation if it’s not necessary: “Calling swapAt(_:_:) with the same index as both i and j has no effect.” We can see that swapAt has early exit of i and j are the same:

    @inlinable
    public mutating func swapAt(_ i: Index, _ j: Index) {
        guard i != j else { return }
        let tmp = self[i]
        self[i] = self[j]
        self[j] = tmp
    }
    

    If you modify your tuple routine to do the same check, the difference was diminished:

    func moveZeroesTuple(_ nums: inout [Int]) {
        var j = 0
        for i in 0..<nums.count where nums[i] != 0 {
            if i != j {
                (nums[i], nums[j]) = (nums[j], nums[i])
            }
            j += 1
        }
    }
    

    Frankly, I was little surprised that the tuple-based approach was as fast as it was, but looking at the assembly, the optimizer does a great job distilling this down to something fairly streamlined:

    0x100b5fb70 <+192>: movq   0x20(%rbx,%rdx,8), %rsi   ; retrieve nums[i]
    0x100b5fb75 <+197>: testq  %rsi, %rsi                ; is it zero? 
    0x100b5fb78 <+200>: je     0x100b5fb98               ; <+232> [inlined] protocol witness for Swift.Strideable.advanced(by: A.Stride) -> A in conformance Swift.Int : Swift.Strideable in Swift
    0x100b5fb7a <+202>: cmpq   %rcx, %rdx                ; is i = j?
    0x100b5fb7d <+205>: je     0x100b5fb93               ; <+227> [inlined] MyAppTests.MyAppTests.moveZeroesTuple(inout Swift.Array<Swift.Int>) -> () + 66 at MyAppTests.swift:120
    0x100b5fb7f <+207>: cmpq   (%rax), %rcx              ; are we done?
    0x100b5fb82 <+210>: jae    0x100b5fbc7               ; <+279> [inlined] generic specialization <Swift.Int> of Swift.Array._getElement(_: Swift.Int, wasNativeTypeChecked: Swift.Bool, matchingSubscriptCheck: Swift._DependenceToken) -> A
    0x100b5fb84 <+212>: movq   0x20(%rbx,%rcx,8), %rdi   ; retrieve nums[j]
    0x100b5fb89 <+217>: movq   %rdi, 0x20(%rbx,%rdx,8)   ; save it in nums[i]
    0x100b5fb8e <+222>: movq   %rsi, 0x20(%rbx,%rcx,8)   ; save previously retrieved nums[i] in nums[j]
    0x100b5fb93 <+227>: incq   %rcx                      ; j += 1
    0x100b5fb96 <+230>: jo     0x100b5fbc5               ; <+277> [inlined] MyAppTests.MyAppTests.moveZeroesTuple(inout Swift.Array<Swift.Int>) -> () at MyAppTests.swift:120
    0x100b5fb98 <+232>: incq   %rdx                      ; i += 1
    0x100b5fb9b <+235>: cmpq   %rdx, %r12                ; repeat for loop
    0x100b5fb9e <+238>: jne    0x100b5fb70               ; <+192> [inlined] generic specialization <Swift.Int> of Swift.Array._getElement(_: Swift.Int, wasNativeTypeChecked: Swift.Bool, matchingSubscriptCheck: Swift._DependenceToken) -> A + 15
    

    The optimized swapAt assembly was pretty much the same (thanks to the inlining), but with just a few more instructions in there:

    0x100b5d920 <+192>: movq   0x20(%rbx,%rdx,8), %rsi
    0x100b5d925 <+197>: testq  %rsi, %rsi
    0x100b5d928 <+200>: je     0x100b5d955               ; <+245> [inlined] protocol witness for Swift.Strideable.advanced(by: A.Stride) -> A in conformance Swift.Int : Swift.Strideable in Swift
    0x100b5d92a <+202>: cmpq   %rcx, %rdx
    0x100b5d92d <+205>: je     0x100b5d950               ; <+240> [inlined] MyAppTests.MyAppTests.moveZeroesSwapAt(inout Swift.Array<Swift.Int>) -> () + 79 at MyAppTests.swift:128
    0x100b5d92f <+207>: movq   (%rax), %rdi
    0x100b5d932 <+210>: cmpq   %rdi, %rdx
    0x100b5d935 <+213>: jae    0x100b5d984               ; <+292> [inlined] generic specialization <Swift.Int> of Swift.Array._getElement(_: Swift.Int, wasNativeTypeChecked: Swift.Bool, matchingSubscriptCheck: Swift._DependenceToken) -> A
    0x100b5d937 <+215>: cmpq   %rdi, %rcx
    0x100b5d93a <+218>: jae    0x100b5d986               ; <+294> [inlined] generic specialization <Swift.Int> of Swift.Array._getElement(_: Swift.Int, wasNativeTypeChecked: Swift.Bool, matchingSubscriptCheck: Swift._DependenceToken) -> A + 2
    0x100b5d93c <+220>: movq   0x20(%rbx,%rcx,8), %rdi
    0x100b5d941 <+225>: movq   %rdi, 0x20(%rbx,%rdx,8)
    0x100b5d946 <+230>: cmpq   (%rax), %rcx
    0x100b5d949 <+233>: jge    0x100b5d988               ; <+296> [inlined] generic specialization <Swift.Int> of Swift._ArrayBuffer._checkInoutAndNativeTypeCheckedBounds(_: Swift.Int, wasNativeTypeChecked: Swift.Bool) -> ()
    0x100b5d94b <+235>: movq   %rsi, 0x20(%rbx,%rcx,8)
    0x100b5d950 <+240>: incq   %rcx
    0x100b5d953 <+243>: jo     0x100b5d982               ; <+290> [inlined] MyAppTests.MyAppTests.moveZeroesSwapAt(inout Swift.Array<Swift.Int>) -> () at MyAppTests.swift:128
    0x100b5d955 <+245>: incq   %rdx
    0x100b5d958 <+248>: cmpq   %rdx, %r12
    0x100b5d95b <+251>: jne    0x100b5d920               ; <+192> [inlined] generic specialization <Swift.Int> of Swift.Array._getElement(_: Swift.Int, wasNativeTypeChecked: Swift.Bool, matchingSubscriptCheck: Swift._DependenceToken) -> A + 15
    

    Still, these few additional instructions are what appear to have given the tuple approach a slight edge in the “many swaps” scenario.

    Bottom line, in my benchmarks, the difference between these is negligible in anything but the most extraordinarily large data sets and isn’t worth worrying about, otherwise. And the performance characteristics depend upon the nature of the dataset, with no clear winner either way in the generalized scenario. And, quite frankly, if your data set was actually large enough to warrant picking one over the other, you really should consider a different algorithm altogether, anyway.


    If you’re interested, here are my unit tests, tested with “release” build, with “randomize execution order” turned on:

    import XCTest
    
    class MyAppTests: XCTestCase {
    
        static var originalArray: [Int]!
    
        let iterations = 1_000_000
    
        // fyi, use static to make sure all tests use the same original array
        // eliminating any discrepancies between different tests, within the same
        // test run, having different data and therefore different number of swaps.
    
        override static func setUp() {
            print("building array")
    
            // scenario 1: no swapping needed
            //
            // Array with 1000 integers, 10% which are zeros and the rest are non-zero
    
            originalArray = (0..<900).map { _ in Int.random(in: 1..<10) } + [Int](repeating: 0, count: 100)
    
            // scenario 2: some swapping needed
            //
            // if you don't want them randomized, comment the following line
    
            originalArray.shuffle()
    
            // scenario 3: since all zeros are at the start, the maximum amount of swapping
            //
            // if you want zeros at start, uncomment the following line
            //
            // originalArray.sort()
        }
    
        func moveZeroesTuple(_ nums: inout [Int]) {
            var j = 0
            for i in 0..<nums.count where nums[i] != 0 {
                if i != j {
                    (nums[i], nums[j]) = (nums[j], nums[i])
                }
                j += 1
            }
        }
    
        func moveZeroesShiftManual(_ nums: inout [Int]) {
            var i = 0
            let count = nums.count
            while i < count, nums[i] != 0 {
                i += 1
            }
            var j = i
            while i < count {
                if nums[i] != 0 {
                    nums[j] = nums[i]
                    j += 1
                }
                i += 1
            }
            while j < count {
                nums[j] = 0
                j += 1
            }
        }
    
        func moveZeroesSwapManual(_ nums: inout [Int]) {
            var j = 0
            for i in 0..<nums.count where nums[i] != 0 {
                if i != j {
                    let temp = nums[i]
                    nums[i] = nums[j]
                    nums[j] = temp
                }
                j += 1
            }
        }
    
        func moveZeroesSwapAt(_ nums: inout [Int]) {
            var j = 0
            for i in 0 ..< nums.count where nums[i] != 0 {
                nums.swapAt(i, j)
                j += 1
            }
        }
    
        // a quick test to make sure all solutions generate the same result
    
        func testLogic() {
            var arrayTuple = MyAppTests.originalArray!
            moveZeroesTuple(&arrayTuple)
    
            var arraySwapAt = MyAppTests.originalArray!
            moveZeroesSwapAt(&arraySwapAt)
    
            var arrayShiftManual = MyAppTests.originalArray!
            moveZeroesShiftManual(&arrayShiftManual)
    
            var arraySwapManual = MyAppTests.originalArray!
            moveZeroesSwapManual(&arraySwapManual)
    
            XCTAssertEqual(arrayTuple, arrayShiftManual)
            XCTAssertEqual(arrayTuple, arraySwapManual)
            XCTAssertEqual(arrayTuple, arraySwapAt)
        }
    
        // now the performance tests
    
        func testTuple() {
            measure {
                for _ in 0 ..< iterations {
                    var array = MyAppTests.originalArray!
                    moveZeroesTuple(&array)
                }
            }
        }
    
        func testSwapAt() {
            measure {
                for _ in 0 ..< iterations {
                    var array = MyAppTests.originalArray!
                    moveZeroesSwapAt(&array)
                }
            }
        }
    
        func testShiftManual() {
            measure {
                for _ in 0 ..< iterations {
                    var array = MyAppTests.originalArray!
                    moveZeroesShiftManual(&array)
                }
            }
        }
    
        func testSwapManual() {
            measure {
                for _ in 0 ..< iterations {
                    var array = MyAppTests.originalArray!
                    moveZeroesSwapManual(&array)
                }
            }
        }
    }
    

    Generally when doing tests on a release build, I’ll do something with the results just to ensure that some critical part of code wasn't entirely optimized away by the compiler. I didn’t need to do that here only because I carefully reviewed the assembly for the various approaches and I know that everything was preserved.

    Results