The following F# code crashes on the third call with a no memory exception. Either I am missing something or Alea is not freeing memory correctly for some reason. I've tried it in both F# Interactive and Compiled. I've also tried calling dispose manually, but it did not work. Any idea why?
let squareGPU (inputs:float[]) =
use dInputs = worker.Malloc(inputs)
use dOutputs = worker.Malloc(inputs.Length)
let blockSize = 256
let numSm = worker.Device.Attributes.MULTIPROCESSOR_COUNT
let gridSize = Math.Min(16 * numSm, divup inputs.Length blockSize)
let lp = new LaunchParam(gridSize, blockSize)
worker.Launch <@ squareKernel @> lp dOutputs.Ptr dInputs.Ptr inputs.Length
dOutputs.Gather()
let x = squareGPU [|0.0..0.001..100000.0|]
printfn "1"
let y = squareGPU [|0.0..0.001..100000.0|]
printfn "2"
let z = squareGPU [|0.0..0.001..100000.0|]
printfn "3"
I guess you got System.OutOfMemoryException
, right? That doesn't mean GPU device memory running out, it means you are running out your host memory. in your example, you created a rather large array in host, and you calculate it, and you gather another large array as output. The point is, you use different variable name (x, y and z) to store the output array, and thus GC will have no chance to free it, so eventually you will run out your host memory.
I did a very simple test (I use the stop value 30000 instead 100000 as in your example), this test only uses host code, no GPU code:
let x1 = [|0.0..0.001..30000.0|]
printfn "1"
let x2 = [|0.0..0.001..30000.0|]
printfn "2"
let x3 = [|0.0..0.001..30000.0|]
printfn "3"
let x4 = [|0.0..0.001..30000.0|]
printfn "4"
let x5 = [|0.0..0.001..30000.0|]
printfn "5"
let x6 = [|0.0..0.001..30000.0|]
printfn "6"
And I ran this code in F# interactive (which is a 32bit process), I got this:
Microsoft (R) F# Interactive version 12.0.30815.0
Copyright (c) Microsoft Corporation. All Rights Reserved.
For help type #help;;
>
1
2
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.Collections.Generic.List`1.set_Capacity(Int32 value)
at System.Collections.Generic.List`1.EnsureCapacity(Int32 min)
at System.Collections.Generic.List`1.Add(T item)
at Microsoft.FSharp.Collections.SeqModule.ToArray[T](IEnumerable`1 source)
at <StartupCode$FSI_0002>.$FSI_0002.main@() in C:\Users\Xiang\Documents\Inbox\ConsoleApplication6\Script1.fsx:line 32
Stopped due to error
>
So that means, after I created 2 such large array (x1 and x2), I ran out of the host memory.
To further confirm this, I use same variable name, which gives GC the chance to collect the old array, and this time it works:
let foo() =
let x = [|0.0..0.001..30000.0|]
printfn "1"
let x = [|0.0..0.001..30000.0|]
printfn "2"
let x = [|0.0..0.001..30000.0|]
printfn "3"
let x = [|0.0..0.001..30000.0|]
printfn "4"
let x = [|0.0..0.001..30000.0|]
printfn "5"
let x = [|0.0..0.001..30000.0|]
printfn "6"
>
val foo : unit -> unit
> foo()
;;
1
2
3
4
5
6
val it : unit = ()
>
and if I add GPU kernel it still works:
let foo() =
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "1"
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "2"
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "3"
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "4"
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "5"
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "6"
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "7"
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "8"
> foo();;
1
2
3
4
5
6
7
8
val it : unit = ()
>
Alternatively, you can try to use 64bit process.