Search code examples
tensorflowtensorflow2.0mnist

CUDA_ERROR_OUT_OF_MEMORY: out of memory on GPU


My GPU info is below.

+-----------------------------------------------------------------------------+                                      
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |                                      
|-------------------------------+----------------------+----------------------+                                       
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |                                        
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |                                         
|===============================+======================+======================|                                         
|   0  GeForce GTX 750 Ti  Off  | 00000000:01:00.0  On |                  N/A |                                          
| 34%   51C    P0     2W /  38W |   1909MiB /  1993MiB |      0%      Default |                                           
+-------------------------------+----------------------+----------------------+                                           

+-----------------------------------------------------------------------------+                                             
| Processes:                                                       GPU Memory |                                              
|  GPU       PID   Type   Process name                             Usage      |                                                
|=============================================================================|                                                
|    0      3492      C   python                                      1467MiB |                                                
|    0      7875      G   ...yCharm-C/ch-0/193.5233.109/jbr/bin/java     2MiB |                                                 
|    0     30812      G   /usr/lib/xorg/Xorg                           163MiB |                                                  
|    0     31133      G   kwin_x11                                      25MiB |                                                  
|    0     31137      G   /usr/bin/krunner                               1MiB |
|    0     31139      G   /usr/bin/plasmashell                          55MiB |
|    0     31536      G   ...uest-channel-token=13296030830960435903   176MiB |
+-----------------------------------------------------------------------------+

When I run the mnist tutorial here: https://www.tensorflow.org/tutorials/quickstart/beginner

I received this error:

2019-12-10 00:27:06.891510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 115 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
2019-12-10 00:27:06.894510: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 115.56M (121176064 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-12-10 00:27:22.271281: F ./tensorflow/core/kernels/random_op_gpu.h:227] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: out of memory

I am using TF-2 on Unbuntu. I have 2 questions: 1) My Ubuntu has 64G memory, and my GPU has about 2G memory. When it reported the error 'out of meomory', is it because the training only uses the GPU's memory, not the 64G?

2) How to solve this out of memory error?


Solution

  • Yes, the training uses the GPU memory because you feed the data to the GPU when training.

    The problem is that the video card that you are using has very little video-memory. 2GB VRAM are not enough for deep learning.

    I recommend that you use at least a video card with 6 GB VRAM.

    If switching to a better hardware is not attainable, you could opt for AWS(Amazon Web Services) or Google Colab to use video cards.