To my knowledge, Data
is a struct that abstracts a byte buffer. It references a physical area in memory, in other words: a contiguous number of bytes. Now I want to efficiently store multiple values in memory (as raw data), where the values are not all of the same type.
My definition of efficient here ≔ Store all those values without any unused buffer / gap bytes.
let a: UInt8 = 39
let b: Int32 = -20001
let string: String = "How awesome is this data?!"
Now I want to store the data of all those values sequentially in memory, without any type information.
let data = [a.asData, b.asData, string.asData].concatenated()
Imagine that the .asData
property retrieves the byte representations of each instance as a [UInt8]
array and then wraps those in a Data
instance. The concetenated()
method then just concatenates these 3 Data
instances to a single Data
instance as follows:
extension Collection where Element == Data {
func concatenated() -> Data {
reduce(into: Data()) { (result, nextDataChunk) in
result.append(nextDataChunk)
}
}
}
Let's assume this all worked great and I now have this single Data
instance from which I want to restore the 3 original values (with their original types). This is what I do:
var cursor = 0
let a: UInt8 = data.withUnsafeBytes { pointer in
pointer.load(fromByteOffset: cursor, as: UInt8.self)
}
cursor += MemoryLayout<UInt8>.size // +1
let b: Int32 = data.withUnsafeBytes { pointer in
pointer.load(fromByteOffset: cursor, as: Int32.self)
}
cursor += MemoryLayout<Int32>.size // +4
let string: String = data.withUnsafeBytes { pointer in
pointer.load(fromByteOffset: cursor, as: String.self)
}
cursor += MemoryLayout<String>.size // +16
The problem is that this throws a runtime error:
Fatal error: load from misaligned raw pointer
and I know exactly why:
Int32
has an alignment of 4 (because it's 4 bytes long). In other words: When reading data with a raw pointer, the first byte of the Int32
must be at an index that is a multiple of 4. But as the first value is a UInt8
only, the data bytes for the Int32
start at index 1, which is not a multiple of 4. Thus, I get the error.
Can I somehow use the raw Data
that represents instances of different types to recreate such instances without alignment errors? How?
And if this is not possible, is there a way to automatically align the Data
chunks correctly when concatenating them in the first place?
The issue about misaligned data is that you need to use Data's subdata method. Besides that you can create some helpers to make your life easier as follow:
This would convert any numeric type to Data:
extension Numeric {
var data: Data {
var bytes = self
return .init(bytes: &bytes, count: MemoryLayout<Self>.size)
}
}
This would convert any type that conforms to String Protocol to Data (String/Substring)
extension StringProtocol {
var data: Data { .init(utf8) }
}
This would convert any valid utf8 encoded sequence of bytes (UInt8) to string
extension DataProtocol {
var string: String? { String(bytes: self, encoding: .utf8) }
}
This is a generic method to convert the bytes to object or to a collection (array) of objects:
extension ContiguousBytes {
func object<T>() -> T { withUnsafeBytes { $0.load(as: T.self) } }
func objects<T>() -> [T] { withUnsafeBytes { .init($0.bindMemory(to: T.self)) } }
}
and a simplified generic version to concatenate an array of data:
extension Collection where Element == DataProtocol {
var data: Data { .init(joined()) }
}
Usage:
let a: UInt8 = 39
let b: Int32 = -20001
let string: String = "How awesome is this data?!"
let data = [a.data, b.data, string.data].data
// just set the cursor (index) at the start position
var cursor = data.startIndex
// get the subdata from that position onwards
let loadedA: UInt8 = data.subdata(in: cursor..<data.endIndex).object() // 39
// advance your cursor for the next position
cursor = cursor.advanced(by: MemoryLayout<UInt8>.size)
// get your next object
let loadedB: Int32 = data.subdata(in: cursor..<data.endIndex).object() // -20001
// advance your position to the start of the string data
cursor = cursor.advanced(by: MemoryLayout<Int32>.size)
// load the subdata as string
let loadedString = data.subdata(in: cursor..<data.endIndex).string // "How awesome is this data?!"
edit/update: Of course loading the string only works because it is located at the end of your collection of bytes otherwise you would need to use 8 bytes to store its size:
let a: UInt8 = 39
let b: Int32 = -20001
let string: String = "How awesome is this data?!"
let c: Int = .max
let data = [a.data, b.data, string.count.data, string.data, c.data].data
var cursor = data.startIndex
let loadedA: UInt8 = data.subdata(in: cursor..<data.endIndex).object() // 39
print(loadedA)
cursor = cursor.advanced(by: MemoryLayout<UInt8>.size)
let loadedB: Int32 = data.subdata(in: cursor..<data.endIndex).object() // -20001
print(loadedB)
cursor = cursor.advanced(by: MemoryLayout<Int32>.size)
let stringCount: Int = data.subdata(in: cursor..<data.endIndex).object()
print(stringCount)
cursor = cursor.advanced(by: MemoryLayout<Int>.size)
let stringEnd = cursor.advanced(by: stringCount)
if let loadedString = data.subdata(in: cursor..<stringEnd).string { // "How awesome is this data?!"
print(loadedString)
cursor = stringEnd
let loadedC: Int = data.subdata(in: cursor..<data.endIndex).object() // 9223372036854775807
print(loadedC)
}
This would print
39
-20001
26
How awesome is this data?!
9223372036854775807