Search code examples
javamultithreadingthreadpoolthread-priority

Sporadic problems in running a multi-threaded Java project in Win7


I am working on a project that is both memory and computationally intensive. A significant portion of the execution utilizes multi-threading by a FixedThreadPool. In short; I have 1 thread for fetching data from several remote locations (using URL connections) and populating a BlockingQueue with objects to be analyzed and n threads that pick these objects and run the analysis. edit: see code below

Now this setup works like a charm on my Linux machine running OpenSUSE 11.3, but a colleague is testing it on a very similar machine running Win7 is getting custom notifications of timeouts on the queue polling (see code below), lots of them actually. I have been trying to monitor the processor use on her machine, and it appears that the software does not get any more than 15% of the CPUs while on my machine the processor usage hits the roof, just as I intended.

My question is, then, can this be a sign of "starvation" of the queue? Could it be so that the producer thread is not getting enough cpu time? If so how do I go about giving one particular thread in the pool higher priority?

UPDATE: I have been trying to pinpoint the problem, with no joy... I did however gain some new insights.

  • Profiling the execution of the code with JVisualVM demonstrates a very peculiar behavior. The methods are called in short bursts of CPU-time with several seconds of no progress in between. This to me means that somehow the OS is hitting the brakes on the process.

  • Disabling the anti-virus and back-up daemons do not have any significant affect on the matter

  • Changing the priority of java.exe (the only instance) through task manager (adviced here) does not change anything either. (That being said, I could not give "realtime" priority to java, and had to be content with "high" prio)

  • Profiling the network usage shows good flow of data in and out, so I am guessing that is not the bottleneck (while it is a considerable part of the execution time of the process, but that I know already and is pretty much the same percentage as what I get on my Linux machine).

Any ideas as to how the Win7 OS might be limiting the cpu time to my project? if it's not the OS, what could be the limiting factor? I would like to stress yet again that the machine is NOT running any other computation intensive at the same time and there is almost no load on the cpus other than my software. This is driving me crazy...

EDIT: relevant code

public ConcurrencyService(Dataset d, QueryService qserv, Set<MyObject> s){

    timeout = 3;
    this.qs = qserv;
    this.bq = qs.getQueue();
    this.ds = d;
    this.analyzedObjects = s;
    this.drc = DebugRoutineContainer.getInstance();
    this.started = false;

    int nbrOfProcs = Runtime.getRuntime().availableProcessors();
    poolSize = nbrOfProcs;
    pool = (ThreadPoolExecutor) Executors.newFixedThreadPool(poolSize);
    drc.setScoreLogStream(new PrintStream(qs.getScoreLogFile()));
}

public void serve() throws InterruptedException {
    try {
        this.ds.initDataset();
        this.started = true;
        pool.execute(new QueryingAction(qs));
        for(;;){
            MyObject p = bq.poll(timeout, TimeUnit.MINUTES);

            if(p != null){
                if (p.getId().equals("0"))
                    break;

                pool.submit(new AnalysisAction(ds, p, analyzedObjects, qs.getKnownAssocs()));
            }else 
                drc.log("Timed out while waiting for an object...");

        }

      } catch (Exception ex) {
            ex.printStackTrace();
            String exit_msg = "Unexpected error in core analysis, terminating execution!";

      }finally{
            drc.log("--DEBUG: Termination criteria found, shutdown initiated..");
            drc.getMemoryInfo(true);    // dump meminfo to log

            pool.shutdown();

            int mins = 2;
            int nCores = poolSize;
            long    totalTasks = pool.getTaskCount(), 
                    compTasks = pool.getCompletedTaskCount(),
                    tasksRemaining = totalTasks - compTasks,
                    timeout = mins * tasksRemaining / nCores;

            drc.log("--DEBUG: Shutdown commenced, thread pool will terminate once all objects are processed, " +
                        "or will timeout in : " + timeout + " minutes... \n" + compTasks + " of " +  (totalTasks -1) + 
                        " objects have been analyzed so far, " + "mean process time is: " +
                        drc.getMeanProcTimeAsString() + " milliseconds.");

            pool.awaitTermination(timeout, TimeUnit.MINUTES);
      }

}

The class QueryingAction is a simple Runnable that calls the data acquisition method in the designated QueryService object which then populates a BlockingQueue. The AnalysisAction class does all the number-crunching for a single instance of MyObject.


Solution

  • So after weeks of fiddling, wrestling in code and other types of suffering I think I had a breakthrough, "a moment of clarity" if you will...

    I managed to show that the program can exhibits the same slow behavior on my Linux machine and can indeed run full throttle on the problematic Win-7 machine. The crux of the problem appears to be some sort of corruption of the system/cache files that are used to store the results of previous queries, and overall, speed up the analysis. You got to love the irony, in this case they appeared to be the reason for EXTREME slow analysis. In retrospect, I should have known (a la Occam's razor)...

    I am still not sure what how the corruption occurs, but at least it's probably not related to different OS. Using the system files from my machine increases the output on the Win7 host up to about 40% only however. Profiling the process more has also revealed that, oddly enough, there is significantly more GC activity on Win7, which apparently took lots of CPU time from number crunching. Giving -Xmx2g takes care of excessive garbage collection and the CPU usage for the process shoots up to 95-96%, and threads run smoothly.

    Now that my original question is answered, I have to say that overall java responsiveness is definitely better on Linux environment, even without allocating more heap memory, I can easily multi-task while I am running an extensive analysis in the background. Things are not as smooth in Win-7, e.x. resizing the GUI is significantly slow once the analysis takes off at full speed.

    Thanks for all the replies, I am sorry for the partially misleading problem description. I merely shared what I found out while debugging to the best of my abilities. Anyways, I believe the bounty goes to Peter Lawrey, since he early on pointed to an I/O issue and it was his suggestion about a logger thread which eventually led me to the answer.