Search code examples
javacudnndeeplearning4jdl4j

is there any solution regarding dl4j with cuda support for this problem?


I am trying to execute MultiGpuLenetMnistExample.java

and i have received following error

" ...

12:41:24.129 [main] INFO Test - Load data....
12:41:24.716 [main] INFO Test - Build model....
12:41:25.500 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend
ND4J CUDA build version: 10.1.243
CUDA device 0: [Quadro K4000]; cc: [3.0]; Total memory: [3221225472];
12:41:26.692 [main] INFO org.nd4j.nativeblas.NativeOpsHolder - Number of threads used for OpenMP: 32
12:41:26.746 [main] INFO org.nd4j.nativeblas.Nd4jBlas - Number of threads used for OpenMP BLAS: 0
12:41:26.755 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Windows 8.1]
12:41:26.755 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Cores: [24]; Memory: [3,5GB];
12:41:26.755 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [CUBLAS]
12:41:26.755 [main] INFO org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner - Device Name: [Quadro K4000]; CC: [3.0]; Total/free memory: [3221225472]
12:41:26.844 [main] INFO org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
12:41:27.957 [main] DEBUG org.nd4j.jita.allocator.impl.MemoryTracker - Free memory on device_0: 2709856256
Exception in thread "main" java.lang.RuntimeException: cudaGetSymbolAddress(...) failed; Error code: [13]
    at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.createShapeInfo(CudaExecutioner.java:2557)
    at org.nd4j.linalg.api.shape.Shape.createShapeInformation(Shape.java:3282)
    at org.nd4j.linalg.api.ndarray.BaseShapeInfoProvider.createShapeInformation(BaseShapeInfoProvider.java:76)
    at org.nd4j.jita.constant.ProtectedCudaShapeInfoProvider.createShapeInformation(ProtectedCudaShapeInfoProvider.java:96)
    at org.nd4j.jita.constant.ProtectedCudaShapeInfoProvider.createShapeInformation(ProtectedCudaShapeInfoProvider.java:77)
    at org.nd4j.linalg.jcublas.CachedShapeInfoProvider.createShapeInformation(CachedShapeInfoProvider.java:44)
    at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:211)
    at org.nd4j.linalg.jcublas.JCublasNDArray.<init>(JCublasNDArray.java:383)
    at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:1543)
    at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:1538)
    at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4298)
    at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3986)
    at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:688)
    at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:604)
    at Test.main(Test.java:80)

Process finished with exit code 1 "

is there any workaround about this problem?


Solution

  • 2 options here: either build dl4j from sources for your target compute capability (3.0) or wait for next release, since we’re going to bring it back for 1 additional release.

    At this point cc 3.0 is just considered deprecated by most frameworks afaik 😞