I have arrays a and b both of signature double[][].
I want to compare the children of those arrays in my kernel (they are of the same length).
At the moment I get an error when
use data = this.GPUWorker.MallocArray(data)
is called.
Exception thrown: 'System.Exception' in Alea.CUDA.dll Additional information: Host array zero-copy binding is non-public feature.
I cannot see how I am incorrectly using the MallocArray function?
let inline (?+) a b = ((b - a) / a) * 100.0
let inline change a b =
let a = a |> Array.reduce (+)
let b = b |> Array.reduce (+)
if a > 0.0 && b > 0.0 && (?+) a b >= 5.0 then a else 0.0
type GPU<'T>(target, op : Expr<'T[] -> 'T[] -> 'T>) =
inherit ILGPUModule(target)
new(target, op : Func<'T[], 'T[], 'T>) =
new GPU<'T>(target, <@ fun x y -> op.Invoke(x, y) @>)
[<Kernel;ReflectedDefinition>]
member this.Kernel (n : int) (input : deviceptr<'T[]>) (input2 : deviceptr<'T[]>) (output : deviceptr<'T>) =
let start = blockIdx.x * blockDim.x + threadIdx.x
let stride = gridDim.x * blockDim.x
let mutable i = start
// TODO this is the actual logic.
while i < n do
let a = input.[i]
let b = input2.[i]
output.[i] <- __eval(op) a b
i <- i + stride
member this.Apply(n : int, input : deviceptr<'T[]>, input2 : deviceptr<'T[]>, output : deviceptr<'T>) =
let numSm = this.GPUWorker.Device.Attributes.MULTIPROCESSOR_COUNT
let blockSize = 256
let gridSize = min (16 * numSm) (divup n blockSize)
let lp = LaunchParam(gridSize, blockSize)
this.GPULaunch <@ this.Kernel @> lp n input input2 output
/// Takes in generic array to be used by GPU.
// May need modification to support other input parameters.
member this.Apply(data : 'T[][], pattern : 'T[][]) =
// Allocate GPU memory for the data sets.
use data = this.GPUWorker.MallocArray(data)
use pattern = this.GPUWorker.MallocArray(pattern)
// Output length is likely to match the number of elements in the input array.
use output = this.GPUWorker.Malloc(data.Length)
// Execute GPU compuation.
this.Apply(data.Length, data.Ptr, pattern.Ptr, output.Ptr)
// Copy data from GPU to CPU memory.
output.Gather()
[<AOTCompile>]
type GPUModule(target) =
inherit GPU<double>(target, fun a b -> change a b)
static let instance = lazy new GPUModule(GPUModuleTarget.DefaultWorker)
static member DefaultInstance = instance.Value
In version 2, you cannot use jagged array, because jagged array itself is not blittable. We are supporting it in the next version.
I think currently you have two options:
if the dimension of your jagged array is known, you can turn it into a linear array, and do some index calculation.
you have to allocate the inner arrays separately, and fill their pointer to an outer array, some thing like:
code:
let innerDMems = jaggedHostArray |> Array.map (fun array -> worker.Malloc array)
use outterDMem = worker.Malloc(innerDMems |> Array.map (fun dmem -> dmem.Ptr))
....
//launch kernel with outterDMem.Ptr which is deviceptr<deviceptr<T>>
....
innerDMems |> Array.iter (fun dmem -> dmem.Dispose())
and then your signature is deviceptr<deviceptr<T>>
, like in C language T**
.