Search code examples
iosswiftsubscript-operator

Good behavior for subscript


I'm creating an extension for String and I'm trying to decide what proper/expected/good behavior would be for a subscript operator. Currently, I have this:

// Will crash on 0 length strings
subscript(kIndex: Int) -> Character {
    var index = kIndex
    index = index < 0 ? 0 : index
    index = index >= self.length ? self.length-1 : index
    let i = self.startIndex.advancedBy(index)
    return self.characters[i]
}

This causes all values outside the range of the string to be capped to the edge of the string. While this reduces crashing from passing a bad index to the subscript, it doesn't feel like the right thing to do. I am unable to throw an exception from a subscript and not checking the subscript causes a BAD_INSTRUCTION error if the index is out of bounds. The only other option I can think of is to return an optional, but that seems awkward. Weighing the options, what I have seems to be the most reasonable, but I don't think anybody using this would expect a bad index to return a valid result.

So, my question is: what is the "standard" expected behavior of the subscript operator and is returning a valid element from an invalid index acceptable/appropriate? Thanks.


Solution

  • If you're implementing a subscript on String, you might want to first think about why the standard library chooses not to.

    When you call self.startIndex.advancedBy(index), you're effectively writing something like this:

    var i = self.startIndex
    while i < index { i = i.successor() }
    

    This occurs because String.CharacterView.Index is not a random-access index type. See docs on advancedBy. String indices aren't random-access because each Character in a string may be any number of bytes in the string's underlying storage — you can't just get character n by jumping n * characterSize into the storage like you can with a C string.

    So, if one were to use your subscript operator to iterate through the characters in a string:

    for i in 0..<string.characters.count {
        doSomethingWith(string[i])
    }
    

    ... you'd have a loop that looks like it runs in linear time, because it looks just like an array iteration — each pass through the loop should take the same amount of time, because each one just increments i and uses a constant-time access to get string[i], right? Nope. The advancedBy call in first pass through the loop calls successor once, the next calls it twice, and so on... if your string has n characters, the last pass through the loop calls successor n times (even though that generates a result that was used in the previous pass through the loop when it called successor n-1 times). In other words, you've just made an O(n2) operation that looks like an O(n) operation, leaving a performance-cost bomb for whoever else uses your code.

    This is the price of a fully Unicode-aware string library.


    Anyhow, to answer your actual question — there are two schools of thought for subscripts and domain checking:

    • Have an optional return type: func subscript(index: Index) -> Element?

      This makes sense when there's no sensible way for a client to check whether an index is valid without performing the same work as a lookup — e.g. for a dictionary, finding out if there's a value for a given key is the same as finding out what the value for a key is.

    • Require that the index be valid, and make a fatal error otherwise.

      The usual case for this is situations where a client of your API can and should check for validity before accessing the subscript. This is what Swift arrays do, because arrays know their count and you don't need to look into an array to see if an index is valid.

      The canonical test for this is precondition: e.g.

      func subscript(index: Index) -> Element {
          precondition(isValid(index), "index must be valid")
          // ... do lookup ...
      }
      

      (Here, isValid is some operation specific to your class for validating an index — e.g. making sure it's > 0 and < count.)

    In just about any use case, it's not idiomatic Swift to return a "real" value in the case of a bad index, nor is it appropriate to return a sentinel value — separating in-band values from sentinels is the reason Swift has Optionals.

    Which of these is more appropriate for your use case is... well, since your use case is problematic to being with, it's sort of a wash. If you precondition that index < count, you still incur an O(n) cost just to check that (because a String has to examine its contents to figure out which sequences of bytes constitute each character before it knows how many characters it has). If you make your return type optional, and return nil after calling advancedBy or count, you've still incurred that O(n) cost.