Search code examples
cpuaffinitymosek

CPU affinity issue using Python API for MOSEK


I am having an issue with CPU affinity and linear integer programming in MOSEK. My program parallelizes using the multiprocessing module in Python, thus MOSEK is running concurrently on each process. The machine has 48 cores so I run 48 concurrent processes using the Pool class. Their documentation states that the API is thread safe.

After starting the program, below is the output from top. It shows that ~50% of the CPU is idle. Shown is only the first 20 lines of the top output.

top - 22:04:42 up 5 days, 14:38,  3 users,  load average: 10.67, 13.65, 6.29
Tasks: 613 total,  47 running, 566 sleeping,   0 stopped,   0 zombie
%Cpu(s): 46.3 us,  3.8 sy,  0.0 ni, 49.2 id,  0.7 wa,  0.0 hi,  0.0 si,  0.0 st
GiB Mem:   503.863 total,  101.613 used,  402.250 free,    0.482 buffers
GiB Swap:   61.035 total,    0.000 used,   61.035 free.   96.250 cached Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
115517 njmeyer   20   0  171752  27912  11632 R  98.7  0.0   0:02.52 python
115522 njmeyer   20   0  171088  27472  11632 R  98.7  0.0   0:02.79 python
115547 njmeyer   20   0  171140  27460  11568 R  98.7  0.0   0:01.82 python
115550 njmeyer   20   0  171784  27880  11568 R  98.7  0.0   0:01.64 python
115540 njmeyer   20   0  171136  27456  11568 R  92.5  0.0   0:01.91 python
115551 njmeyer   20   0  371636  31100  11632 R  92.5  0.0   0:02.93 python
115539 njmeyer   20   0  171132  27452  11568 R  80.2  0.0   0:01.97 python
115515 njmeyer   20   0  171748  27908  11632 R  74.0  0.0   0:03.02 python
115538 njmeyer   20   0  171128  27512  11632 R  74.0  0.0   0:02.51 python
115558 njmeyer   20   0  171144  27528  11632 R  74.0  0.0   0:02.28 python
115554 njmeyer   20   0  527980  28728  11632 R  67.8  0.0   0:02.15 python
115524 njmeyer   20   0  527956  28676  11632 R  61.7  0.0   0:02.34 python
115526 njmeyer   20   0  527956  28704  11632 R  61.7  0.0   0:02.80 python

I checked the MOSEK parameters section of the documentation and I didn't see anything related to CPU affinity. They have some flags related to multithreading within the optimizer. These flags are set to off as default, and when redundantly setting it to off there is no change.

I checked the cpu affinity of the running python jobs and many of them are bound to the same cpu. But, the weird part is I can't set the cpu affinity, or at least it appears to be changed again soon after I change it.

I picked one of the jobs and set the cpu affinity by running taskset -p 0xFFFFFFFFFFFF 115526. I do this 10 times with 1 second in between. Here is the cpu affinity mask after each taskset call.

pid 115526's current affinity mask: 10
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 7
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 200000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 47
pid 115526's current affinity mask: ffffffffffff
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47
pid 115526's current affinity mask: 800000000000
pid 115526's new affinity mask: ffffffffffff
pid 115526's current affinity list: 0-47

It seems like something is continually changing the cpu affinity during run time.

I have also tried setting the cpu affinity of the parent process, but it has the same effect.

Here is the code I am running.

import mosek
import sys
import cPickle as pickle
import multiprocessing
import time

def mosekOptim(aCols,aVals,b,c,nCon,nVar,numTrt):
    """Solve the linear integer program.


    Solve the program
    max c' x
    s.t. Ax <= b

    """

    ## setup mosek
    with mosek.Env() as env, env.Task() as task:
        task.appendcons(nCon)
        task.appendvars(nVar)
        inf = float("inf")


        ## c
        for j,cj in enumerate(c):
            task.putcj(j,cj)


        ## bounds on A
        bkc = [mosek.boundkey.fx] + [mosek.boundkey.up
                                     for i in range(nCon-1)]

        blc = [float(numTrt)] + [-inf for i in range(nCon-1)]
        buc = b


        ## bounds on x
        bkx = [mosek.boundkey.ra for i in range(nVar)]
        blx = [0.0]*nVar
        bux = [1.0]*nVar

        for j,a in enumerate(zip(aCols,aVals)):
            task.putarow(j,a[0],a[1])

        for j,bc in enumerate(zip(bkc,blc,buc)):
            task.putconbound(j,bc[0],bc[1],bc[2])

        for j,bx in enumerate(zip(bkx,blx,bux)):
            task.putvarbound(j,bx[0],bx[1],bx[2])

        task.putobjsense(mosek.objsense.maximize)

        ## integer type
        task.putvartypelist(range(nVar),
                            [mosek.variabletype.type_int
                             for i in range(nVar)])

        task.optimize()

        task.solutionsummary(mosek.streamtype.msg)

        prosta = task.getprosta(mosek.soltype.itg)
        solsta = task.getsolsta(mosek.soltype.itg)

        xx = mosek.array.zeros(nVar,float)
        task.getxx(mosek.soltype.itg,xx)

    if solsta not in [ mosek.solsta.integer_optimal,
                   mosek.solsta.near_integer_optimal ]:
        print "".join(mosekMsg)
        raise ValueError("Non optimal or infeasible.")
    else:
        return xx


def reps(secs,*args):
    start = time.time()
    while time.time() - start < secs:
        for i in range(100):
            mosekOptim(*args)


def main():
    with open("data.txt","r") as f:
        data = pickle.loads(f.read())

    args = (60,) + data

    pool = multiprocessing.Pool()
    jobs = []
    for i in range(multiprocessing.cpu_count()):
        jobs.append(pool.apply_async(reps,args=args))
    pool.close()
    pool.join()

if __name__ == "__main__":
    main()

The code unpickles data I precomputed. These objects are the contsraints and coefficients for the linear program. I have the code and this data file hosted in this repository.

Has anyone else experience this behavior with MOSEK? Any suggestions for how to proceed?


Solution

  • I contacted support, and they suggested setting MSK_IPAR_NUM_THREADS to 1. My problem takes fractions of a second to solve, so it never looked like it was using multiple cores. Should have checked the docs for default values.

    In my code, I added task.putintparam(mosek.iparam.num_threads,1) right after the with statement. This fixed the problem.