I have a task to train neural networks using tensorflow and opencv-python on HPC nodes via Torque.
I have made privatemodule with python virtualenv and installed tensorflow and opencv-python modules in it.
In the node I can load my python module. But when I try to run training script I get following error:
Traceback (most recent call last):
File "tensornetwork/train_user_ind_single_subj2.py", line 16, in <module>
from reader_user_ind_single_subj import MyData
File "/home/trig/tensornetwork/reader_user_ind_single_subj.py", line 10, in <module>
import cv2
File "/home/trig/privatemodules/venv_python275/lib/python2.7/site-packages/cv2/__init__.py", line 4, in <module>
from .cv2 import *
ImportError: libSM.so.6: cannot open shared object file: No such file or directory
The training script can run on head node, but cant on compute node.
Can you suggest how to modify my module or add a new module to make training run on compute node using Torque.
The Python module uses a system library (namely libSM.so.6
: library support for the freedesktop.org version of X) that is present on the head node, but not on the compute nodes (which is not very surprising)
You can either:
/usr/lib
or /usr/lib64
or siblings), and copy it in /home/trig/privatemodules/venv_python275/lib/python2.7/site-packages/cv2/
, where Python should find it. If Python still does not find it, run export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/trig/privatemodules/venv_python275/lib/python2.7/site-packages/cv2/
in your Torque script after you load the module.libSM
and compile it in your home directory