Search code examples
arraysf#seq

If I convert a sequence to an array and treat it as a sequence, do I get O(1) on length?


I was wondering whether I would get special treatment when a sequence is converted to an array and then henceforth treated as a sequence again.

let sq = seq { for i in 0 .. 10 do yield i }
let arr = Seq.toArray sq
let len = Array.length arr      // O(1)
let sq2 = arr |> Seq.ofArray

// from converted seq
let len2 = Seq.length sq2       // O(n)???

// or direct:
let len2 = Seq.length arr       // O(n)???

On the same token, is F# smart enough with Seq.toArray arr to simply create a copy of the array, to leave it alone (not create a copy), or would it iterate over each item using the enumerator?

Put in another way, do sequence in F# remember somehow that their internal structure is an array?

I am asking this since on expensive sequences, you may need the length multiple times, and evaluating it once would be beneficial. I can either create a specific sequence type that remembers the length, or I could use the magic that's already there.


Solution

  • If the sequence is actually an array type then it will simply be cast back to an array to determine the array within Seq.length. You can see this in in the implementation of the length function here:

    [<CompiledName("Length")>]
    let length (source : seq<'T>)    = 
        checkNonNull "source" source
        match source with 
        | :? ('T[]) as a -> a.Length
        | :? ('T list) as a -> a.Length
        | :? ICollection<'T> as a -> a.Count
        | _ -> 
            use e = source.GetEnumerator() 
            let mutable state = 0 
            while e.MoveNext() do
                state <-  state + 1;
            state
    

    You can see this behaviour if you put it in FSI:

    let arr = [|1..40000000|];;
    

    Using Array.length:

    Array.length arr;;
    Real: 00:00:00.000, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
    val it : int = 40000000
    

    Using Seq.length:

    Seq.length arr;;
    Real: 00:00:00.000, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
    val it : int = 40000000
    

    If you use Seq.ofArray you are specifically hiding the underlying type information, creating a new enumerator that steps through the array element by element.

    This can be a useful behaviour because it prevents a consumer of your API from sneakily casting seq<'T> back to 'T[] and consequently allowing said consumer to mutate something that you, the API designer, expected to be exposing an immutable view of.

    The downside of this information hiding is that you can't cast back to array so the enumeration becomes significantly slower:

    Seq.length <| Seq.ofArray arr;;
    Real: 00:00:00.148, CPU: 00:00:00.140, GC gen0: 0, gen1: 0, gen2: 0
    val it : int = 40000000
    

    Seq.ofArray uses the mkSeq function which just creates an anonymous IEnumerable from an ArrayEnumerator:

    let mkSeq f = 
        { new IEnumerable<'U> with 
              member x.GetEnumerator() = f()
          interface IEnumerable with 
              member x.GetEnumerator() = (f() :> IEnumerator) }