Search code examples
pythontransfercondor

Condor running python successfully, but doesn't show output files


I'm new to HTCondor and I'm trying to run a python script on the condor system. I want to use cv2 and numpy in my code while being able to read my prints and my pickled data after completion.

Currently the code runs and completes (log file: return value 0). But the condor_bin.out is empty where my prints should appear. And there is no file random_dat.pickle transfered.

Am I doing something wrong?

Python script:

import numpy as np
import pickle
import cv2 as cv

print('test')
# setup cv2
sift = cv.SIFT_create()
img = cv.imread("0.jpg", cv.IMREAD_GRAYSCALE)

for i in range(25):
    # calc cv2
    kp, des = sift.detectAndCompute(img, None)
    # calc np
    norms = np.linalg.norm(des, axis=1)

# calc normal? python
index = []
for p in kp:
    temp = (p.pt, p.size, p.angle, p.response, p.octave, p.class_id)
    index.append(temp)

with open('./random_dat.pickle', 'wb') as handle:
    pickle.dump((123456, index, des, norms), handle)
    
print("finished")

Condor setup file (test.info)

#Normal execution
Universe = vanilla

#I need just one CPU (which is the default)
RequestCpus    = 1
#No GPU
RequestGPUs    = 0
#I need disk spqce KB
RequestDisk = 150MB
#I need 2 GBytes of RAM (resident memory)
RequestMemory  = 150MB
#It will not run longer than 1 day
+RequestWalltime = 100

#retrieve data
#should_transfer_files = YES
#when_to_transfer_output = ON_EXIT

#I'm a nice person, I think...
NiceUser = true
#Mail me only if something is wrong
Notification = Always

# The job will 'cd' to this directory before starting, be sure you can _write_ here.
initialdir = /users/students/r0xxxxxx/Documents/testing_condor/
# This is the executable or script I want to run
executable = /users/students/r0xxxxxx/Documents/testing_condor/main.py

#Output of condors handling of the jobs, will be in 'initialdir'
Log          = condor_bin.log
#Standard output of the 'executable', in 'initialdir'
Output       = condor_bin.out
#Standard error of the 'executable', in 'initialdir'
Error        = condor_bin.err
#Standard error of the 'executable', in 'initialdir'

# Start just 1 instance of the job
Queue 1

I submitted it using condor_submit test.info which resulted in the following log in condor_bin.log:

...
000 (356.000.000) 2021-07-15 18:23:28 Job submitted from host: <10.xx.xx.xxx:xxxx?addrs=10.xx.xx.xxx-xxxx&alias=abcdefg.abcd.abcdefg.be&noUDP&sock=schedd_2422_de78>
...
000 (357.000.000) 2021-07-15 18:24:19 Job submitted from host: <10.xx.xx.xxx:xxxx?addrs=10.xx.xx.xxx-xxxx&alias=abcdefg.abcd.abcdefg.be&noUDP&sock=schedd_2422_de78>
...
040 (356.000.000) 2021-07-15 18:24:21 Started transferring input files
    Transferring to host: <10.xx.xx.xx:xxxx?addrs=10.xx.xx.xx-xxxx&alias=other.abcd.abcdefg.be&noUDP&sock=slot1_1_123445_eb75_5374>
...
040 (356.000.000) 2021-07-15 18:24:21 Finished transferring input files
...
001 (356.000.000) 2021-07-15 18:24:22 Job executing on host: <10.xx.xx.xx:xxxx?addrs=10.xx.xx.xx-xxxx&alias=other.abcd.abcdefg.be&noUDP&sock=startd_2178_815c>
...
006 (356.000.000) 2021-07-15 18:24:22 Image size of job updated: 1
    0  -  MemoryUsage of job (MB)
    0  -  ResidentSetSize of job (KB)
...
040 (356.000.000) 2021-07-15 18:24:22 Started transferring output files
...
040 (356.000.000) 2021-07-15 18:24:22 Finished transferring output files
...
005 (356.000.000) 2021-07-15 18:24:22 Job terminated.
    (1) Normal termination (return value 0)
        Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
        Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
        Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
        Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
    0  -  Run Bytes Sent By Job
    803  -  Run Bytes Received By Job
    0  -  Total Bytes Sent By Job
    803  -  Total Bytes Received By Job
    Partitionable Resources :    Usage  Request Allocated 
       Cpus                 :                 1         1 
       Disk (KB)            :       13   153600    782129 
       Gpus (Average)       :                 0         0 
       Memory (MB)          :        0      150       256 

    Job terminated of its own accord at 2021-07-15T16:24:22Z.
...

As you can see in the test.info, I've tried to use

should_transfer_files = YES
when_to_transfer_output = ON_EXIT

But that didn't work.

How can I see my print statements and how can I see my pickled data after completion?

Thanks a lot for your help!


Solution

  • Adding #!/usr/bin/python as @Greg suggested resulted in following error

    Executable file 'my_file/path' is a script with CRLF (DOS/Windows) line endings.
    This generally doesn't work, and you should probably run 'dos2unix myfile/path' -- or a similar tool -- before you resubmit.
    

    I generated a new Python file on my Linux system which added following lines as a prefix

    #!/usr/bin/env python3
    # -*- coding: utf-8 -*-
    

    Which successfully runs on condor when using the should_transfer_files = YES and when_to_transfer_output = ON_EXIT settings in the test.info condor file.

    TLDR; Running Python code generated in Windows can produce errors on a condor system running on Linux. Fix: Write/copy your code into a Linux generated Python file.