Search code examples
mesosapache-aurora

How to disable apache mesos memory/disk isolation?


I am checking out Apache Aurora (1.1.0)(0.16.0) and Apache Mesos (0.16.0) (1.1.0)with docker container. Here is an example Aurora job definition,

process_nginx = Process(
    name='nginx',
    cmdline=textwrap.dedent(r'''
        exec /path_to/nginx -g "daemon off; pid /run/nginx.pid; error_log stderr notice;"
    '''),
    min_duration=3,
    daemon=True,
)

task_nginx = Task(
    name='nginx',
    processes=[process_nginx,],
    resources=Resources(
        cpu=0.1,
        ram=20*MB,
        disk=50*MB,
    ),
    finalization_wait=14,
)

job_nginx = Job(
    cluster='x',
    role='root',
    name='nginx',
    instances=6,
    service=True,
    task=task_nginx,
    priority=1,
    #tier='preferred',
    constraints={
        'X_HOST_MACHINE_ID': 'limit:2',
        'HOST_TYPE.FRONTEND': 'true',
    },
    update_config=UpdateConfig(
        batch_size=1,
        watch_secs=29,
        rollback_on_failure=True,
    ),
    container=Docker(
        image='my_nginx_docker_image_name',
        parameters=[
            {'name': 'network', 'value': 'host'},
            {'name': 'log-driver', 'value': 'journald'},
            {'name': 'log-opt', 'value': 'tag=nginx'},
            {'name': 'oom-score-adj', 'value': '-500'},
            {'name': 'memory-swappiness', 'value': '1'},
        ],
    ),
)

But, since specifying disk and ram limits bother me, I want to make both disabled.

problem 1

I thought only CPU resource would be isolated(=limited) if my all mesos agents are launched with the option --isolation=cgroups/cpu (not --isolation=cgroups/cpu,cgroups/mem).

But even in this case, all docker containers launched by mesos docker containerizer have --memory option, which is hard limit and causes OOM killer if a docker container requires more memory. (And it seems mesos docker containerizer does not support --memory-reservation.)

problem 2

Even in case of --isolation=cgroups/cpu, removing ram or disk parameter from Aurora Resource instance causes the following error.

Error loading configuration: TypeCheck(FAILED): MesosJob[task] failed: Task[resources] failed: Resources[ram] is required.

My question

  • Is it possible to disable memory and disk isolation ?
  • What is the difference between --isolation=cgroups/cpu and --isolation=cgroups/cpu,cgroups/mem?

Solution

  • As you've discovered, you can disable the memory and disk isolators in Mesos by not specifying them as part of the isolation agent flag. I'm unsure about the behavior of the Docker Containerizer in this scenario, but you might want to try using the Mesos Containerizer instead, as this is the preferred way to run Docker images in Mesos going forward.

    As far as omitting the Resources from your Aurora config goes, unfortunately that won't be possible. Every Aurora job must specify its resource requirements so that the scheduler can match your task instances up with an offer from Mesos.