Search code examples
iosswiftpointersnsdatafoundation

Storing the data representations of multiple, differently typed objects in a single Data instance


Motivation

To my knowledge, Data is a struct that abstracts a byte buffer. It references a physical area in memory, in other words: a contiguous number of bytes. Now I want to efficiently store multiple values in memory (as raw data), where the values are not all of the same type.

My definition of efficient here ≔ Store all those values without any unused buffer / gap bytes.

Storing the raw data in memory

let a: UInt8 = 39
let b: Int32 = -20001
let string: String = "How awesome is this data?!"

Now I want to store the data of all those values sequentially in memory, without any type information.

let data = [a.asData, b.asData, string.asData].concatenated()

Imagine that the .asData property retrieves the byte representations of each instance as a [UInt8] array and then wraps those in a Data instance. The concetenated() method then just concatenates these 3 Data instances to a single Data instance as follows:

extension Collection where Element == Data {
    func concatenated() -> Data {
        reduce(into: Data()) { (result, nextDataChunk) in
            result.append(nextDataChunk)
        }
    }
}

Reading the data back from memory into the respective types

Let's assume this all worked great and I now have this single Data instance from which I want to restore the 3 original values (with their original types). This is what I do:

var cursor = 0

let a: UInt8 = data.withUnsafeBytes { pointer in
    pointer.load(fromByteOffset: cursor, as: UInt8.self)
}
cursor += MemoryLayout<UInt8>.size // +1

let b: Int32 = data.withUnsafeBytes { pointer in
    pointer.load(fromByteOffset: cursor, as: Int32.self)
}
cursor += MemoryLayout<Int32>.size // +4

let string: String = data.withUnsafeBytes { pointer in
    pointer.load(fromByteOffset: cursor, as: String.self)
}
cursor += MemoryLayout<String>.size // +16

The Problem

The problem is that this throws a runtime error:

Fatal error: load from misaligned raw pointer

and I know exactly why:

Int32 has an alignment of 4 (because it's 4 bytes long). In other words: When reading data with a raw pointer, the first byte of the Int32 must be at an index that is a multiple of 4. But as the first value is a UInt8 only, the data bytes for the Int32 start at index 1, which is not a multiple of 4. Thus, I get the error.


My question is this:

  • Can I somehow use the raw Data that represents instances of different types to recreate such instances without alignment errors? How?

  • And if this is not possible, is there a way to automatically align the Data chunks correctly when concatenating them in the first place?


Solution

  • The issue about misaligned data is that you need to use Data's subdata method. Besides that you can create some helpers to make your life easier as follow:

    This would convert any numeric type to Data:

    extension Numeric {
        var data: Data {
            var bytes = self
            return .init(bytes: &bytes, count: MemoryLayout<Self>.size)
        }
    }
    

    This would convert any type that conforms to String Protocol to Data (String/Substring)

    extension StringProtocol {
        var data: Data { .init(utf8) }
    }
    

    This would convert any valid utf8 encoded sequence of bytes (UInt8) to string

    extension DataProtocol {
        var string: String? { String(bytes: self, encoding: .utf8) }
    }
    

    This is a generic method to convert the bytes to object or to a collection (array) of objects:

    extension ContiguousBytes {
        func object<T>() -> T { withUnsafeBytes { $0.load(as: T.self) } }
        func objects<T>() -> [T] { withUnsafeBytes { .init($0.bindMemory(to: T.self)) } }
    }
    

    and a simplified generic version to concatenate an array of data:

    extension Collection where Element == DataProtocol {
        var data: Data { .init(joined()) }
    }
    

    Usage:

    let a: UInt8 = 39
    let b: Int32 = -20001
    let string: String = "How awesome is this data?!"
    let data = [a.data, b.data, string.data].data
    
    // just set the cursor (index) at the start position
    var cursor = data.startIndex
    // get the subdata from that position onwards
    let loadedA: UInt8 = data.subdata(in: cursor..<data.endIndex).object()  // 39
    // advance your cursor for the next position
    cursor = cursor.advanced(by: MemoryLayout<UInt8>.size)
    // get your next object
    let loadedB: Int32 = data.subdata(in: cursor..<data.endIndex).object()  // -20001
    // advance your position to the start of the string data
    cursor = cursor.advanced(by: MemoryLayout<Int32>.size)
    // load the subdata as string
    let loadedString = data.subdata(in: cursor..<data.endIndex).string  // "How awesome is this data?!"
    

    edit/update: Of course loading the string only works because it is located at the end of your collection of bytes otherwise you would need to use 8 bytes to store its size:

    let a: UInt8 = 39
    let b: Int32 = -20001
    let string: String = "How awesome is this data?!"
    let c: Int = .max
    let data = [a.data, b.data, string.count.data, string.data, c.data].data
    
    var cursor = data.startIndex
    let loadedA: UInt8 = data.subdata(in: cursor..<data.endIndex).object()  // 39
    print(loadedA)
    cursor = cursor.advanced(by: MemoryLayout<UInt8>.size)
    let loadedB: Int32 = data.subdata(in: cursor..<data.endIndex).object()  // -20001
    print(loadedB)
    cursor = cursor.advanced(by: MemoryLayout<Int32>.size)
    let stringCount: Int = data.subdata(in: cursor..<data.endIndex).object()
    print(stringCount)
    cursor = cursor.advanced(by: MemoryLayout<Int>.size)
    let stringEnd = cursor.advanced(by: stringCount)
    
    if let loadedString = data.subdata(in: cursor..<stringEnd).string {  // "How awesome is this data?!"
        print(loadedString)
        cursor = stringEnd
        let loadedC: Int = data.subdata(in: cursor..<data.endIndex).object()  // 9223372036854775807
        print(loadedC)
    }
    

    This would print

    39
    -20001
    26
    How awesome is this data?!
    9223372036854775807