Search code examples
f#computation-expression

Why does PSeq.map with computation expression seem to hang?


I'm writing a scraper using FSharp.Collections.ParallelSeq and a retry computation. I would like to retrieve HTML from multiple pages in parallel, and I would like to retry requests when they fail.

For example:

open System
open FSharp.Collections.ParallelSeq

type RetryBuilder(max) = 
  member x.Return(a) = a               // Enable 'return'
  member x.Delay(f) = f                // Gets wrapped body and returns it (as it is)
                                       // so that the body is passed to 'Run'
  member x.Zero() = failwith "Zero"    // Support if .. then 
  member x.Run(f) =                    // Gets function created by 'Delay'
    let rec loop(n) = 
      if n = 0 then failwith "Failed"  // Number of retries exceeded
      else try f() with _ -> loop(n-1)
    loop max

let retry = RetryBuilder(4)

let getHtml (url : string) = retry { 
    Console.WriteLine("Get Url")
    return 0;
}

//A property/field?
let GetHtmlForAllPages = 
    let pages = {1 .. 10}
    let allHtml = pages |> PSeq.map(fun x -> getHtml("http://somesite.com/" + x.ToString())) |> Seq.toArray
    allHtml

[<EntryPoint>]
let main argv = 
    let htmlForAllPages = GetHtmlForAllPages
    0 // return an integer exit code

When I try to interact with GetHtmlForAllPages from main the code seems to hang. Stepping through the code shows me that PSeq.map begins work on the first four values of pages.

What's going on that causes the retry computation expression to never start/complete? Is there some weird interplay between PSeq and retry?

The code works as expected if I make GetHtmlForAllPages a function and invoke it. I'm curious what's going on when GetHtmlForAllPages is a field?


Solution

  • Looks like you're deadlocking within a static constructor. The scenario is described here:

    The CLR uses an internal lock to ensure that static constructor:

    • is only called once
    • gets executed before creation of any instance of the class or before accessing any static members.

    With this behaviour of CLR, there is a potential opportunity of a deadlock if we perform any asynchronous blocking operation in a static constructor. (...)

    The main thread will wait for the helper thread to complete within the static constructor. Since the helper thread is accessing the instance method, it will first try to acquire the internal lock. As internal lock is already acquired by the main thread, we will end-up in a deadlock situation.

    Using Parallel LINQ (or any other similar library like FSharp.Collections.ParallelSeq) in a static constructor will make you run into that problem.

    Unfortunately, a static constructor of a compiler-generated class is what you get for your GetHtmlForAllPages value. From ILSpy (with C# formatting):

    namespace <StartupCode$ConsoleApplication1>
    {
        internal static class $Program
        {
            [DebuggerBrowsable(DebuggerBrowsableState.Never)]
            internal static readonly Program.RetryBuilder retry@17;
    
            [DebuggerBrowsable(DebuggerBrowsableState.Never)]
            internal static readonly int[] GetHtmlForAllPages@24;
    
            [DebuggerBrowsable(DebuggerBrowsableState.Never), DebuggerNonUserCode, CompilerGenerated]
            internal static int init@;
    
            static $Program()
            {
                $Program.retry@17 = new Program.RetryBuilder(4);
                IEnumerable<int> pages = Operators.OperatorIntrinsics.RangeInt32(1, 1, 10);
                ParallelQuery<int> parallelQuery = PSeqModule.map<int, int>(new Program.allHtml@26(), pages);
                ParallelQuery<int> parallelQuery2 = parallelQuery;
                int[] allHtml = SeqModule.ToArray<int>((IEnumerable<int>)parallelQuery2);
                $Program.GetHtmlForAllPages@24 = allHtml;
            }
        }
    }
    

    and in your actual Program class:

    [CompilationMapping(SourceConstructFlags.Value)]
    public static int[] GetHtmlForAllPages
    {
        get
        {
            return $Program.GetHtmlForAllPages@24;
        }
    }
    

    That's where the deadlock is coming from.

    As soon as you change GetHtmlForAllPages to be a function (by adding ()) it is no longer part of that static constructor, which makes the program work as expected.