Search code examples
python-2.7opencvhpctorqueenvironment-modules

create environment module to work with opencv-python on hpc nodes


I have a task to train neural networks using tensorflow and opencv-python on HPC nodes via Torque.

I have made privatemodule with python virtualenv and installed tensorflow and opencv-python modules in it.

In the node I can load my python module. But when I try to run training script I get following error:

Traceback (most recent call last): File "tensornetwork/train_user_ind_single_subj2.py", line 16, in <module> from reader_user_ind_single_subj import MyData File "/home/trig/tensornetwork/reader_user_ind_single_subj.py", line 10, in <module> import cv2 File "/home/trig/privatemodules/venv_python275/lib/python2.7/site-packages/cv2/__init__.py", line 4, in <module> from .cv2 import * ImportError: libSM.so.6: cannot open shared object file: No such file or directory

The training script can run on head node, but cant on compute node.

Can you suggest how to modify my module or add a new module to make training run on compute node using Torque.


Solution

  • The Python module uses a system library (namely libSM.so.6 : library support for the freedesktop.org version of X) that is present on the head node, but not on the compute nodes (which is not very surprising)

    You can either:

    • ask the administrators to have that library installed systemwide on the compute nodes through the package manager ;
    • or locate the file on the head node (probably in /usr/lib or /usr/lib64 or siblings), and copy it in /home/trig/privatemodules/venv_python275/lib/python2.7/site-packages/cv2/, where Python should find it. If Python still does not find it, run export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/trig/privatemodules/venv_python275/lib/python2.7/site-packages/cv2/ in your Torque script after you load the module.
    • or you can search for the source for libSM and compile it in your home directory