Search code examples
cudansight

How to Debug CUDA code on a remote server?


I want to debug the CUDA code on the remote server, My equipment is as follows:

Pc: Ubuntu 16.04,CUDA 8.0.61,nvcc v8.0.61,Geforce MX150 and integrated graphics

Server: Ubuntu 14.04,CUDA 8.0.61,nvcc v8.0.61,Tesla P100-PCIE*2

I have installed the Nsight Eclipse Edition 7.5 on my pc, What I want is to use the remote debug function that I can use the Visual debug window on my PC and remote gdbserver,But I have encountered some problems.

When I configure remote debugging to try to connect to a remote server,the Connection timed out return from the connection.

I don't know if it's related to the port. When I log in to the server, it looks like this:

SSH -P 50034 [email protected]

When setting up the connection, I noticed that port 2345 seems to be used, so I don't know if there is a conflict.The setting page is like this: enter image description here

So far,I have tried the following:

  1. Reinstall the Nsight Eclipse Edition,and debug sample code on the server in the command line which runs correctly.
  2. Some forums mention that port 2345 should be opened,AFAIK,You can use this port as long as the port is not occupied when requested,but,I am not sure about it.
  3. Considering that the server has certain restrictions on the visitor's IP address,so I tried to change the network several times but failed in the end.

Any ideas?


Solution

  • This problem was finally solved and I got a sigh of relief.

    Since my server is in a cluster, ports are mapped. For example, the command when connecting to the server the port id should be attached with (i.e.ssh -p 50034 [email protected]), so when the nsight eclipse edition's port 2345 (default) accesses the server, it does not recognize the port, then the time out delay is given, so I mapped the port 2345 to the server's port 2345, The map roughly as shown below:

    -A PREROUTING -d xxx.xxx.xxx.xxx -p tcp -m tcp --dport 2345 -j DNAT --to-destination xxx.xxx.xxx.xxx:2345
    

    The first xxx.xxx.xxx.xxx is the server's IP adress while the last one is the real adress of ur server in the cluster such as 11.11.11.24,which is Owing to the different conditions.