Search code examples
processmodelingsimgrid

Is there any limitations to number of running processes on host?


In platform file I have only one host:

        <host id="Worker1" speed="100Mf" core="101"/>

Then in worker.c I create 101 (or > 100) processes expecting that each on each core one process will be launched. But I noticed that only 100 first processes able to execute task or write with XBT_INFO:

int worker(int argc, char *argv[])
{
    for (int i = 0; i < 101; ++i) {
        MSG_process_create("x", slave, NULL, MSG_host_self());
    }
    return 0;
}

int slave(){
    MSG_task_execute(MSG_task_create("kotok", 1e6, 0, NULL));
    MSG_process_kill(MSG_process_self());
    return 0;
}

Other processes above 100 first ones are unable to manage and kill:

[  1.000000] (0:maestro@) Oops ! Deadlock or code not perfectly clean.
[  1.000000] (0:maestro@) 1 processes are still running, waiting for something.
[  1.000000] (0:maestro@) Legend of the following listing: "Process <pid> (<name>@<host>): <status>"
[  1.000000] (0:maestro@) Process 102 (x@Worker1): waiting for execution synchro 0x26484d0 (kotok) in state 2 to finish

UPDATE Here some code functions are:

main

int main(int argc, char *argv[])
{
  MSG_init(&argc, argv);

  MSG_create_environment(argv[1]);          /** - Load the platform description */
  MSG_function_register("worker", worker);
  MSG_launch_application(argv[2]);          /** - Deploy the application */

  msg_error_t res = MSG_main();             /** - Run the simulation */

  XBT_INFO("Simulation time %g", MSG_get_clock());

  return res != MSG_OK;
}

deployment.xml

<?xml version='1.0'?>
<!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid/simgrid.dtd">
<platform version="4">

    <process host="Worker1" function="worker">
        <argument value="0"/>
    </process>

</platform>

Solution

  • There is actually an internal limit for the size of a maxmin system (the core of SimGrid), which is 100, and may be hit in this case. I just added a flag to make this limit configurable. Could you pull the last version, and try setting maxmin/concurrency_limit to 1000 and see if it fixes your issue?