ruby-on-rails ruby-on-rails-4 actionmailer delayed-job sendgrid

ActionMailer + DelayedJob + Resque + SendGrid = not all emails being sent

I've got a Rails 4.2.5 app on Heroku using ActionMailer, SendGrid, and DelayedJob/Resque for background processing. I'm having a strange issue that I can't reproduce on localhost.

I send out a batch of emails. No errors are raised, thus the emails appear to be successfully sent out. I then check SendGrid's activity tab and see only half of the emails there.

I haven't done a lot with SendGrid or ActionMailer, so I'm not really sure how to debug the issue or reproduce it locally. I've tried reproducing by creating 100 contacts in my app and then sending them the same notification message (just as what would happen in the app on production), but they all go through successfully.

In development.rb and production.rb I have:

config.action_mailer.raise_delivery_errors = true

In production.rb I have:

config.action_mailer.default_url_options = { :host => 'http://firmplay.com' }
# config.action_mailer.delivery_method = :sendmail
# config.action_mailer.sendmail_settings = {
#   :location => '/usr/sbin/sendmail',
#   :arguments => '-i -t'
# }

# config.action_mailer.perform_deliveries = true
# config.action_mailer.raise_delivery_errors = true

I'm not sure if the commented code should be uncommented or not.

In environment.rb I have:

ActionMailer::Base.smtp_settings = {
  :address        => 'smtp.sendgrid.net',
  :port           => '587',
  :authentication => :plain,
  :user_name      => ENV['SENDGRID_USERNAME'],
  :password       => ENV['SENDGRID_PASSWORD'],
  :domain         => 'heroku.com',
  :enable_starttls_auto => true
}

In application.rb I have:

config.active_job.queue_adapter = :resque

I'm not sure what I'm doing wrong here or even when the issue started, though I suspect it's related to my recent upgrade from Rails 4.0 to 4.2.5. I wondered if the emails were failing somewhere in DelayedJob/Resque, so I checked the Resque logs and found nothing out of the ordinary.

Update with partial solution

I managed to make some progress in addressing the issue though. When I was testing on production I realized that when using my web dyno to queue up emails as background jobs, it timed out after 10 seconds, which is the limit for the app. Even after the web worker timed out though, it continued queuing emails until it ran out of memory (saw some R14 errors in the logs).

I wasn't able to determine why it was running out of memory when queuing emails. I wasn't storing the emails in memory as a variable, but I suppose it's possible they were unintentionally being stored as I iterated through my contacts list.

What I ended up doing is creating a background job that I could pass the contacts list to. Then the background job sends the emails out directly. When I tested this approach everything worked great and it scaled up pretty well. So in essence, my problem is solved, though I'd still like to find out why the web dyno was running out of memory when queuing emails, while the background job worker running Resque didn't have any such problems.

I was initially iterating through my contacts using .map, which stores the last item returned within each loop in memory. I switched to .each instead, thinking that would solve the problem, but it didn't. In any case, the problem seems to be that Ruby stores the emails in memory for the life of the controller.

Solution

The source of your issue seems to be handled in this Heroku article.