I am having an issue with CPU affinity and linear integer programming in MOSEK. My program parallelizes using the multiprocessing
module in Python
, thus MOSEK is running concurrently on each process. The machine has 48 cores so I run 48 concurrent processes using the Pool
class. Their documentation states that the API is thread safe.
After starting the program, below is the output from top
. It shows that ~50% of the CPU is idle. Shown is only the first 20 lines of the top output.
top - 22:04:42 up 5 days, 14:38, 3 users, load average: 10.67, 13.65, 6.29
Tasks: 613 total, 47 running, 566 sleeping, 0 stopped, 0 zombie
%Cpu(s): 46.3 us, 3.8 sy, 0.0 ni, 49.2 id, 0.7 wa, 0.0 hi, 0.0 si, 0.0 st
GiB Mem: 503.863 total, 101.613 used, 402.250 free, 0.482 buffers
GiB Swap: 61.035 total, 0.000 used, 61.035 free. 96.250 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
115517 njmeyer 20 0 171752 27912 11632 R 98.7 0.0 0:02.52 python
115522 njmeyer 20 0 171088 27472 11632 R 98.7 0.0 0:02.79 python
115547 njmeyer 20 0 171140 27460 11568 R 98.7 0.0 0:01.82 python
115550 njmeyer 20 0 171784 27880 11568 R 98.7 0.0 0:01.64 python
115540 njmeyer 20 0 171136 27456 11568 R 92.5 0.0 0:01.91 python
115551 njmeyer 20 0 371636 31100 11632 R 92.5 0.0 0:02.93 python
115539 njmeyer 20 0 171132 27452 11568 R 80.2 0.0 0:01.97 python
115515 njmeyer 20 0 171748 27908 11632 R 74.0 0.0 0:03.02 python
115538 njmeyer 20 0 171128 27512 11632 R 74.0 0.0 0:02.51 python
115558 njmeyer 20 0 171144 27528 11632 R 74.0 0.0 0:02.28 python
115554 njmeyer 20 0 527980 28728 11632 R 67.8 0.0 0:02.15 python
115524 njmeyer 20 0 527956 28676 11632 R 61.7 0.0 0:02.34 python
115526 njmeyer 20 0 527956 28704 11632 R 61.7 0.0 0:02.80 python
I checked the MOSEK parameters section of the documentation and I didn't see anything related to CPU affinity. They have some flags related to multithreading within the optimizer. These flags are set to off
as default, and when redundantly setting it to off
there is no change.
I checked the cpu affinity of the running python jobs and many of them are bound to the same cpu. But, the weird part is I can't set the cpu affinity, or at least it appears to be changed again soon after I change it.
I picked one of the jobs and set the cpu affinity by running taskset -p 0xFFFFFFFFFFFF 115526
. I do this 10 times with 1 second in between. Here is the cpu affinity mask after each taskset
call.
pid 115526's current affinity mask: 10
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 7
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 200000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
It seems like something is continually changing the cpu affinity during run time.
I have also tried setting the cpu affinity of the parent process, but it has the same effect.
Here is the code I am running.
import mosek
import sys
import cPickle as pickle
import multiprocessing
import time
def mosekOptim(aCols,aVals,b,c,nCon,nVar,numTrt):
"""Solve the linear integer program.
Solve the program
max c' x
s.t. Ax <= b
"""
## setup mosek
with mosek.Env() as env, env.Task() as task:
task.appendcons(nCon)
task.appendvars(nVar)
inf = float("inf")
## c
for j,cj in enumerate(c):
task.putcj(j,cj)
## bounds on A
bkc = [mosek.boundkey.fx] + [mosek.boundkey.up
for i in range(nCon-1)]
blc = [float(numTrt)] + [-inf for i in range(nCon-1)]
buc = b
## bounds on x
bkx = [mosek.boundkey.ra for i in range(nVar)]
blx = [0.0]*nVar
bux = [1.0]*nVar
for j,a in enumerate(zip(aCols,aVals)):
task.putarow(j,a[0],a[1])
for j,bc in enumerate(zip(bkc,blc,buc)):
task.putconbound(j,bc[0],bc[1],bc[2])
for j,bx in enumerate(zip(bkx,blx,bux)):
task.putvarbound(j,bx[0],bx[1],bx[2])
task.putobjsense(mosek.objsense.maximize)
## integer type
task.putvartypelist(range(nVar),
[mosek.variabletype.type_int
for i in range(nVar)])
task.optimize()
task.solutionsummary(mosek.streamtype.msg)
prosta = task.getprosta(mosek.soltype.itg)
solsta = task.getsolsta(mosek.soltype.itg)
xx = mosek.array.zeros(nVar,float)
task.getxx(mosek.soltype.itg,xx)
if solsta not in [ mosek.solsta.integer_optimal,
mosek.solsta.near_integer_optimal ]:
print "".join(mosekMsg)
raise ValueError("Non optimal or infeasible.")
else:
return xx
def reps(secs,*args):
start = time.time()
while time.time() - start < secs:
for i in range(100):
mosekOptim(*args)
def main():
with open("data.txt","r") as f:
data = pickle.loads(f.read())
args = (60,) + data
pool = multiprocessing.Pool()
jobs = []
for i in range(multiprocessing.cpu_count()):
jobs.append(pool.apply_async(reps,args=args))
pool.close()
pool.join()
if __name__ == "__main__":
main()
The code unpickles data I precomputed. These objects are the contsraints and coefficients for the linear program. I have the code and this data file hosted in this repository.
Has anyone else experience this behavior with MOSEK? Any suggestions for how to proceed?
I contacted support, and they suggested setting MSK_IPAR_NUM_THREADS
to 1
. My problem takes fractions of a second to solve, so it never looked like it was using multiple cores. Should have checked the docs for default values.
In my code, I added task.putintparam(mosek.iparam.num_threads,1)
right after the with
statement. This fixed the problem.