If I try and initialize a Swift Data
struct with a relatively large MutableRandomAccessSlice<Data>
the program starts grows large in memory use and takes a long time to finish. However, doing the same thing in Objective-C with NSData
appears to not have the same problem.
For example, with the following code:
let startData = Data(count: 100_000_000)
let finalData = Data(startData[0..<95_234_877])
if I compile it using:
xcrun swiftc -O -sdk `xcrun --show-sdk-path --sdk macosx` -o output main.swift
the execution (on my MacBook Air 2011) takes a long time to finish (87s) and the memory usage is through the roof (see up to 625MB below):
$ time ./output
./output 85.21s user 1.29s system 99% cpu 1:26.91 total
$ top -o MEM
PID COMMAND %CPU TIME #TH #WQ #PORT MEM PURG CMPRS PGRP PPID STATE
38156 output 99.0 01:25.57 1/1 0 10 625M+ 0B 992M+ 38156 36025 running
If I profile each step it takes about 0.00015s to create startData
, 0.000007s to create the slice from startData
, and the rest of the time to initialize finalData
.
If I do the same thing in Objective-C:
NSData *startData = [[NSMutableData alloc] initWithLength:100000000];
NSData *finalData = [startData subdataWithRange:NSMakeRange(0, 95234877)];
it only takes roughly 0.00017s.
Am I doing something wrong in the Swift example? There seems to be a very large discrepency between the two.
As you have found, the Objective-C code [startData subdataWithRange:NSMakeRange(0, 95234877)]
is equivalent to startData.subdata(in: 0..<95_234_877)
.
When you write Data(startData[0..<95_234_877])
, Swift calls public convenience init<S : Sequence where S.Iterator.Element == Iterator.Element>(_ elements: S)
of RangeReplaceableCollection
, it's defined in RangeReplaceableCollection.swift.gyb. The core part of the implementation is like this:
for element in newElements {
append(element)
}
You know repeating append
to a collection may be inefficient.
And, if you want to initialize a Data
from [UInt8]
, you'd better call an initializer specific for [UInt8]
:
let data = Data(bytes: [UInt8](repeating: 0, count: 10_000_000))
Data([UInt8](repeating: 0, count: 100_000_000))
calls the initializer in RangeReplaceableCollection
noted above.
In my opinion, Swift should optimize such default implementations much more, but hard to make them as efficient as type specific operations.