Search code examples
f#aleagpu

Alea does not dispose memory correctly


The following F# code crashes on the third call with a no memory exception. Either I am missing something or Alea is not freeing memory correctly for some reason. I've tried it in both F# Interactive and Compiled. I've also tried calling dispose manually, but it did not work. Any idea why?

let squareGPU (inputs:float[]) =
        use dInputs = worker.Malloc(inputs)
        use dOutputs = worker.Malloc(inputs.Length)
        let blockSize = 256
        let numSm = worker.Device.Attributes.MULTIPROCESSOR_COUNT
        let gridSize = Math.Min(16 * numSm, divup inputs.Length blockSize)
        let lp = new LaunchParam(gridSize, blockSize)
        worker.Launch <@ squareKernel @> lp dOutputs.Ptr dInputs.Ptr inputs.Length
        dOutputs.Gather()


let x = squareGPU [|0.0..0.001..100000.0|]
printfn "1" 
let y = squareGPU [|0.0..0.001..100000.0|]
printfn "2" 
let z = squareGPU [|0.0..0.001..100000.0|]
printfn "3"

Solution

  • I guess you got System.OutOfMemoryException, right? That doesn't mean GPU device memory running out, it means you are running out your host memory. in your example, you created a rather large array in host, and you calculate it, and you gather another large array as output. The point is, you use different variable name (x, y and z) to store the output array, and thus GC will have no chance to free it, so eventually you will run out your host memory.

    I did a very simple test (I use the stop value 30000 instead 100000 as in your example), this test only uses host code, no GPU code:

    let x1 = [|0.0..0.001..30000.0|]
    printfn "1" 
    let x2 = [|0.0..0.001..30000.0|]
    printfn "2" 
    let x3 = [|0.0..0.001..30000.0|]
    printfn "3"
    let x4 = [|0.0..0.001..30000.0|]
    printfn "4"
    let x5 = [|0.0..0.001..30000.0|]
    printfn "5"
    let x6 = [|0.0..0.001..30000.0|]
    printfn "6"
    

    And I ran this code in F# interactive (which is a 32bit process), I got this:

    Microsoft (R) F# Interactive version 12.0.30815.0
    Copyright (c) Microsoft Corporation. All Rights Reserved.
    
    For help type #help;;
    
    > 
    1
    2
    System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
       at System.Collections.Generic.List`1.set_Capacity(Int32 value)
       at System.Collections.Generic.List`1.EnsureCapacity(Int32 min)
       at System.Collections.Generic.List`1.Add(T item)
       at Microsoft.FSharp.Collections.SeqModule.ToArray[T](IEnumerable`1 source)
       at <StartupCode$FSI_0002>.$FSI_0002.main@() in C:\Users\Xiang\Documents\Inbox\ConsoleApplication6\Script1.fsx:line 32
    Stopped due to error
    > 
    

    So that means, after I created 2 such large array (x1 and x2), I ran out of the host memory.

    To further confirm this, I use same variable name, which gives GC the chance to collect the old array, and this time it works:

    let foo() =
        let x = [|0.0..0.001..30000.0|]
        printfn "1" 
        let x = [|0.0..0.001..30000.0|]
        printfn "2" 
        let x = [|0.0..0.001..30000.0|]
        printfn "3"
        let x = [|0.0..0.001..30000.0|]
        printfn "4"
        let x = [|0.0..0.001..30000.0|]
        printfn "5"
        let x = [|0.0..0.001..30000.0|]
        printfn "6"
    
    > 
    
    val foo : unit -> unit
    
    > foo()
    ;;
    1
    2
    3
    4
    5
    6
    val it : unit = ()
    > 
    

    and if I add GPU kernel it still works:

    let foo() =
        let x = squareGPU [|0.0..0.001..30000.0|]
        printfn "1" 
        let x = squareGPU [|0.0..0.001..30000.0|]
        printfn "2" 
        let x = squareGPU [|0.0..0.001..30000.0|]
        printfn "3"
        let x = squareGPU [|0.0..0.001..30000.0|]
        printfn "4"
        let x = squareGPU [|0.0..0.001..30000.0|]
        printfn "5"
        let x = squareGPU [|0.0..0.001..30000.0|]
        printfn "6"
        let x = squareGPU [|0.0..0.001..30000.0|]
        printfn "7"
        let x = squareGPU [|0.0..0.001..30000.0|]
        printfn "8"
    
    > foo();;
    1
    2
    3
    4
    5
    6
    7
    8
    val it : unit = ()
    > 
    

    Alternatively, you can try to use 64bit process.