Search code examples
swiftpointersunsafemutablepointer

UnsafeMutablePointer.pointee and didSet properties


I got some unexpected behavior using UnsafeMutablePointer on an observed property in a struct I created (on Xcode 10.1, Swift 4.2). See the following playground code:

struct NormalThing {
    var anInt = 0
}

struct IntObservingThing {
    var anInt: Int = 0 {
        didSet {
            print("I was just set to \(anInt)")
        }
    }
}

var normalThing = NormalThing(anInt: 0)
var ptr = UnsafeMutablePointer(&normalThing.anInt)
ptr.pointee = 20
print(normalThing.anInt) // "20\n"

var intObservingThing = IntObservingThing(anInt: 0)
var otherPtr = UnsafeMutablePointer(&intObservingThing.anInt)
// "I was just set to 0."

otherPtr.pointee = 20
print(intObservingThing.anInt) // "0\n"

Seemingly, modifying the pointee on an UnsafeMutablePointer to an observed property doesn't actually modify the value of the property. Also, the act of assigning the pointer to the property fires the didSet action. What am I missing here?


Solution

  • Any time you see a construct like UnsafeMutablePointer(&intObservingThing.anInt), you should be extremely wary about whether it'll exhibit undefined behaviour. In the vast majority of cases, it will.

    First, let's break down exactly what's happening here. UnsafeMutablePointer doesn't have any initialisers that take inout parameters, so what initialiser is this calling? Well, the compiler has a special conversion that allows a & prefixed argument to be converted to a mutable pointer to the 'storage' referred to by the expression. This is called an inout-to-pointer conversion.

    For example:

    func foo(_ ptr: UnsafeMutablePointer<Int>) {
      ptr.pointee += 1
    }
    
    var i = 0
    foo(&i)
    print(i) // 1
    

    The compiler inserts a conversion that turns &i into a mutable pointer to i's storage. Okay, but what happens when i doesn't have any storage? For example, what if it's computed?

    func foo(_ ptr: UnsafeMutablePointer<Int>) {
      ptr.pointee += 1
    }
    
    var i: Int {
      get { return 0 }
      set { print("newValue = \(newValue)") }
    }
    foo(&i)
    // prints: newValue = 1
    

    This still works, so what storage is being pointed to by the pointer? To solve this problem, the compiler:

    1. Calls i's getter, and places the resultant value into a temporary variable.
    2. Gets a pointer to that temporary variable, and passes that to the call to foo.
    3. Calls i's setter with the new value from the temporary.

    Effectively doing the following:

    var j = i // calling `i`'s getter
    foo(&j)
    i = j     // calling `i`'s setter
    

    It should hopefully be clear from this example that this imposes an important constraint on the lifetime of the pointer passed to foo – it can only be used to mutate the value of i during the call to foo. Attempting to escape the pointer and using it after the call to foo will result in a modification of only the temporary variable's value, and not i.

    For example:

    func foo(_ ptr: UnsafeMutablePointer<Int>) -> UnsafeMutablePointer<Int> {
      return ptr
    }
    
    var i: Int {
      get { return 0 }
      set { print("newValue = \(newValue)") }
    }
    let ptr = foo(&i)
    // prints: newValue = 0
    ptr.pointee += 1
    

    ptr.pointee += 1 takes place after i's setter has been called with the temporary variable's new value, therefore it has no effect.

    Worse than that, it exhibits undefined behaviour, as the compiler doesn't guarantee that the temporary variable will remain valid after the call to foo has ended. For example, the optimiser could de-initialise it immediately after the call.

    Okay, but as long as we only get pointers to variables that aren't computed, we should be able to use the pointer outside of the call it was passed to, right? Unfortunately not, turns out there's lots of other ways to shoot yourself in the foot when escaping inout-to-pointer conversions!

    To name just a few (there are many more!):

    • A local variable is problematic for a similar reason to our temporary variable from earlier – the compiler doesn't guarantee that it will remain initialised until the end of the scope it's declared in. The optimiser is free to de-initialise it earlier.

      For example:

      func bar() {
        var i = 0
        let ptr = foo(&i)
        // Optimiser could de-initialise `i` here.
      
        // ... making this undefined behaviour!
        ptr.pointee += 1
      }
      
    • A stored variable with observers is problematic because under the hood it's actually implemented as a computed variable that calls its observers in its setter.

      For example:

      var i: Int = 0 {
        willSet(newValue) {
          print("willSet to \(newValue), oldValue was \(i)")
        }
        didSet(oldValue) {
          print("didSet to \(i), oldValue was \(oldValue)")
        }
      }
      

      is essentially syntactic sugar for:

      var _i: Int = 0
      
      func willSetI(newValue: Int) {
        print("willSet to \(newValue), oldValue was \(i)")
      }
      
      func didSetI(oldValue: Int) {
        print("didSet to \(i), oldValue was \(oldValue)")
      }
      
      var i: Int {
        get {
          return _i
        }
        set {
          willSetI(newValue: newValue)
          let oldValue = _i
          _i = newValue
          didSetI(oldValue: oldValue)
        }
      }
      
    • A non-final stored property on classes is problematic as it can be overridden by a computed property.

    And this isn't even considering cases that rely on implementation details within the compiler.

    For this reason, the compiler only guarantees stable and unique pointer values from inout-to-pointer conversions on stored global and static stored variables without observers. In any other case, attempting to escape and use a pointer from an inout-to-pointer conversion after the call it was passed to will lead to undefined behaviour.


    Okay, but how does my example with the function foo relate to your example of calling an UnsafeMutablePointer initialiser? Well, UnsafeMutablePointer has an initialiser that takes an UnsafeMutablePointer argument (as a result of conforming to the underscored _Pointer protocol which most standard library pointer types conform to).

    This initialiser is effectively same as the foo function – it takes an UnsafeMutablePointer argument and returns it. Therefore when you do UnsafeMutablePointer(&intObservingThing.anInt), you're escaping the pointer produced from the inout-to-pointer conversion – which, as we've discussed, is only valid if it's used on a stored global or static variable without observers.

    So, to wrap things up:

    var intObservingThing = IntObservingThing(anInt: 0)
    var otherPtr = UnsafeMutablePointer(&intObservingThing.anInt)
    // "I was just set to 0."
    
    otherPtr.pointee = 20
    

    is undefined behaviour. The pointer produced from the inout-to-pointer conversion is only valid for the duration of the call to UnsafeMutablePointer's initialiser. Attempting to use it afterwards results in undefined behaviour. As matt demonstrates, if you want scoped pointer access to intObservingThing.anInt, you want to use withUnsafeMutablePointer(to:).

    I'm actually currently working on implementing a warning (which will hopefully transition to an error) that will be emitted on such unsound inout-to-pointer conversions. Unfortunately I haven't had much time lately to work on it, but all things going well, I'm aiming to start pushing it forwards in the new year, and hopefully get it into a Swift 5.x release.

    In addition, it's worth noting that while the compiler doesn't currently guarantee well-defined behaviour for:

    var normalThing = NormalThing(anInt: 0)
    var ptr = UnsafeMutablePointer(&normalThing.anInt)
    ptr.pointee = 20
    

    From the discussion on #20467, it looks like this will likely be something that the compiler does guarantee well-defined behaviour for in a future release, due to the fact that the base (normalThing) is a fragile stored global variable of a struct without observers, and anInt is a fragile stored property without observers.