Search code examples
resquemonit

How do I create a monit loop for multiple processes to monitor?


This example shows how to monitor a single resque queue

check process resque_worker_QUEUE
  with pidfile /data/APP_NAME/current/tmp/pids/resque_worker_QUEUE.pid
  start program = "/usr/bin/env HOME=/home/user RACK_ENV=production PATH=/usr/local/bin:/usr/local/ruby/bin:/usr/bin:/bin:$PATH /bin/sh -l -c 'cd /data/APP_NAME/current; nohup bundle exec rake environment resque:work RAILS_ENV=production QUEUE=queue_name VERBOSE=1 PIDFILE=tmp/pids/resque_worker_QUEUE.pid >> log/resque_worker_QUEUE.log 2>&1'" as uid deploy and gid deploy
  stop program = "/bin/sh -c 'cd /data/APP_NAME/current && kill -9 $(cat tmp/pids/resque_worker_QUEUE.pid) && rm -f tmp/pids/resque_worker_QUEUE.pid; exit 0;'"
  if totalmem is greater than 300 MB for 10 cycles then restart  # eating up memory?
  group resque_workers

where QUEUE is typically the index of the queue. Does monit itself have the ability to create a loop so that QUEUE can be the index or iterator so if I have 6 workers to create I can still have a single block of configuration code inside a block? Or must I create a monit configuration builder that does the iterating to produce a hardcoded set of worker monitors as an output?

So instead of

check process resque_worker_0
  with pidfile /data/APP_NAME/current/tmp/pids/resque_worker_0.pid
  start program = "/usr/bin/env HOME=/home/user RACK_ENV=production PATH=/usr/local/bin:/usr/local/ruby/bin:/usr/bin:/bin:$PATH /bin/sh -l -c 'cd /data/APP_NAME/current; nohup bundle exec rake environment resque:work RAILS_ENV=production QUEUE=queue_name VERBOSE=1 PIDFILE=tmp/pids/resque_worker_0.pid >> log/resque_worker_0.log 2>&1'" as uid deploy and gid deploy
  stop program = "/bin/sh -c 'cd /data/APP_NAME/current && kill -9 $(cat tmp/pids/resque_worker_0.pid) && rm -f tmp/pids/resque_worker_0.pid; exit 0;'"
  if totalmem is greater than 300 MB for 10 cycles then restart  # eating up memory?
  group resque_workers

check process resque_worker_1
  with pidfile /data/APP_NAME/current/tmp/pids/resque_worker_1.pid
  start program = "/usr/bin/env HOME=/home/user RACK_ENV=production PATH=/usr/local/bin:/usr/local/ruby/bin:/usr/bin:/bin:$PATH /bin/sh -l -c 'cd /data/APP_NAME/current; nohup bundle exec rake environment resque:work RAILS_ENV=production QUEUE=queue_name VERBOSE=1 PIDFILE=tmp/pids/resque_worker_1.pid >> log/resque_worker_1.log 2>&1'" as uid deploy and gid deploy
  stop program = "/bin/sh -c 'cd /data/APP_NAME/current && kill -9 $(cat tmp/pids/resque_worker_1.pid) && rm -f tmp/pids/resque_worker_1.pid; exit 0;'"
  if totalmem is greater than 300 MB for 10 cycles then restart  # eating up memory?
  group resque_workers

I could do something like this (pseudo-code for the loop I know)

[0..1].each |QUEUE|
    check process resque_worker_QUEUE
      with pidfile /data/APP_NAME/current/tmp/pids/resque_worker_QUEUE.pid
      start program = "/usr/bin/env HOME=/home/user RACK_ENV=production PATH=/usr/local/bin:/usr/local/ruby/bin:/usr/bin:/bin:$PATH /bin/sh -l -c 'cd /data/APP_NAME/current; nohup bundle exec rake environment resque:work RAILS_ENV=production QUEUE=queue_name VERBOSE=1 PIDFILE=tmp/pids/resque_worker_QUEUE.pid >> log/resque_worker_QUEUE.log 2>&1'" as uid deploy and gid deploy
      stop program = "/bin/sh -c 'cd /data/APP_NAME/current && kill -9 $(cat tmp/pids/resque_worker_QUEUE.pid) && rm -f tmp/pids/resque_worker_QUEUE.pid; exit 0;'"
      if totalmem is greater than 300 MB for 10 cycles then restart  # eating up memory?
      group resque_workers
end

Solution

  • I couldn't find any evidence that monit can do this on its own, therefore I wrote a ruby monit resque config file builder and inserted into the capistrano deployment tasks.

    in config/deploy/production.rb

    set :resque_worker_count, 6
    

    in lib/capistrano/tasks/monit.rake

    def build_entry(process_name,worker_pid_file,worker_config_file,start_command,stop_command)
    <<-END_OF_ENTRY
    check process #{process_name}
      with pidfile #{worker_pid_file}
      start program = \"#{start_command}\" with timeout 90 seconds
      stop program = \"#{stop_command}\" with timeout 90 seconds
      if totalmem is greater than 500 MB for 4 cycles then restart # eating up memory?
      group resque
    END_OF_ENTRY
    end
    
    namespace :monit do
      desc "Build monit configuration file for monitoring resque workers"
      task :build_resque_configuration_file do
        on roles(:app) do |host|
          # Setup the reusable variables across all worker entries
          rails_env = fetch(:rails_env)
          app_name = fetch(:application)
          monit_resque_config_file_path = "#{shared_path}/config/monit/resque"
          resque_control_script = "#{shared_path}/bin/resque-control"
          monit_wrapper_script = "/usr/local/sbin/monit-wrapper"
          config_file_content = []
          (0..((fetch(:resque_worker_count)).to_i - 1)).each do |worker|
            # Setup the variables for the worker entry
            process_name = "resque_#{worker}"
            worker_config_file = "resque_#{worker}.conf"
            worker_pid_file = "/var/run/resque/#{app_name}/resque_#{worker}.pid"
            start_command = "#{monit_wrapper_script} #{resque_control_script} #{app_name} start #{rails_env} #{worker_config_file}"
            stop_command = "#{monit_wrapper_script} #{resque_control_script} #{app_name} stop #{rails_env} #{worker_config_file}"
            # Build the config file entry for the worker
            config_file_content << build_entry(process_name,worker_pid_file,worker_config_file,start_command,stop_command)
          end
          # Save the file locally for inspection (debugging)
          temp_file = "/tmp/#{app_name}_#{rails_env}_resque"
          File.delete(temp_file) if File.exist?(temp_file)
          File.open(temp_file,'w+') {|f| f.write config_file_content.join("\n") }
          # Upload the results to the server 
          upload! temp_file, monit_resque_config_file_path
        end
      end
    end