Search code examples
ruby-on-railsherokucloud9-idepuma

How can I get around Heroku's HTTP 30 second limit?


I inherited a rails app that is deployed using Heroku (I think). I edit it on AWS's Cloud9 IDE and, for now, just do everything in development mode. The app's purpose is to process large amounts of survey data and spit it out onto a PDF report. This works for small reports with like 10 rows of data, but when I load a report that is querying a data upload of 5000+ rows to create an HTML page which gets converted to a PDF, it takes around 105 seconds, much longer than Heroku's 30 seconds allotted for HTTP requests.

Heroku says this on their website, which gave me some hope:

"Heroku supports HTTP 1.1 features such as long-polling and streaming responses. An application has an initial 30 second window to respond with a single byte back to the client. However, each byte transmitted thereafter (either received from the client or sent by your application) resets a rolling 55 second window. If no data is sent during the 55 second window, the connection will be terminated." (Source: https://devcenter.heroku.com/articles/request-timeout#long-polling-and-streaming-responses)

This sounds excellent to me - I can just send a request to the client every second or so in a loop until we're done creating the large PDF report. However, I don't know how to send or receive a byte or so to "reset the rolling 55 second window" they're talking about.

Here's the part of my controller that is sending the request.

            return render pdf: pdf_name + " " + pdf_year.to_s,
                disposition: 'attachment',
                page_height: 1300,
                encoding: 'utf8',
                page_size:   'A4',
                footer: {html: {template: 'recent_grad/footer.html.erb'}, spacing: 0 },
                margin:  {   top:    10,                     # default 10 (mm)
                            bottom: 20,
                            left:   10,
                            right:  10 },
                template: "recent_grad/report.html.erb",
                locals: {start: @start, survey: @survey, years: @years, college: @college, department: @department, program: @program, emphasis: @emphasis, questions: @questions}

I'm making other requests to get to this point, but I believe the part that is causing the issue is here where the template is being rendered. My template queries the database in a finite loop that stops when it runs out of survey questions to query from.

My question is this: how can I "send or receive a byte to the client" to tell Heroku "I'm still trying to create this massive PDF so please reset the timer and give me my 55 seconds!" Is it in the form of a query? Because, if so, I am querying the MySql database over and over again in my report.html.erb file.

Also, it used to work without issues and does work on small reports, but now I get the error "504 Gateway Timeout" before the request is complete on the actual page, but my puma console continues to query the database like a mad man. I assume it's a Heroku problem because the 504 error happens exactly every 35 seconds (5 seconds to process the other parts and 30 seconds to try to finish the loop in the template so it can render correctly).

If you need more information or code, please ask! Thanks in advance

EDIT: Both of the comments below suggest possible duplicates, but neither of them have a real answer with real code, they simply refer to the docs that I am quoting here. I'm looking for a code example (or at least a way to get my foot in the door), not just a link to the docs. Thanks!

EDIT 2:

I tried what @Sergio said and installed SideKiq. I think I'm really close, but still having some issues with the worker. The worker doesn't have access to ActionView::Base which is required for the render method in rails, so it's not working. I can access the worker method which means my sidekiq and redis servers are running correctly, but it gets caught on the ActionView line with this error:

WARN: NameError: uninitialized constant HardWorker::ActionView

Here's the worker code:

require 'sidekiq'

Sidekiq.configure_client do |config|
  # config.redis = { db: 1 }
  config.redis = { url: 'redis://172.31.6.51:6379/0' }
end

Sidekiq.configure_server do |config|
  # config.redis = { db: 1 }
  config.redis = { url: 'redis://172.31.6.51:6379/0' }
end  

class HardWorker
  include Sidekiq::Worker
  def perform(pdf_name, pdf_year)
    av = ActionView::Base.new()
    av.view_paths = ActionController::Base.view_paths
    av.class_eval do
      include Rails.application.routes.url_helpers
      include ApplicationHelper
    end
    puts "inside hardworker"
    puts pdf_name, pdf_year

    av.render pdf: pdf_name + " " + pdf_year.to_s,
                disposition: 'attachment',
                page_height: 1300,
                encoding: 'utf8',
                page_size:   'A4',
                footer: {html: {template: 'recent_grad/footer.html.erb'}, spacing: 0 },
                margin:  {   top:    10,                     # default 10 (mm)
                            bottom: 20,
                            left:   10,
                            right:  10 },
                template: "recent_grad/report.html.erb",
                locals: {start: @start, survey: @survey, years: @years, college: @college, department: @department, program: @program, emphasis: @emphasis, questions: @questions}
  end
end


Any suggestions?

EDIT 3: I did what @Sergio said and attempted to make a PDF from an html.erb file directly and save it to a file. Here's my code:

# /app/controllers/recentgrad_controller.rb

pdf = WickedPdf.new.pdf_from_html_file('home/ec2-user/environment/gradSurvey/gradSurvey/app/views/recent_grad/report.html.erb')
            save_path = Rails.root.join('pdfs', pdf_name + pdf_year.to_s + '.pdf')
            File.open(save_path, 'wb') do |file|
              file << pdf
            end

And the error output:

RuntimeError (Failed to execute:
["/usr/local/rvm/gems/ruby-2.4.1@gradSurvey/bin/wkhtmltopdf", "file:///home/ec2-user/environment/gradSurvey/gradSurvey/app/views/recent_grad/report.html.erb", "/tmp/wicked_pdf_generated_file20190523-15416-hvb3zg.pdf"]
Error: PDF could not be generated!
 Command Error: Loading pages (1/6)
Error: Failed loading page file:///home/ec2-user/environment/gradSurvey/gradSurvey/app/views/recent_grad/report.html.erb (sometimes it will work just to ignore this error with --load-error-handling ignore)
Exit with code 1 due to network error: ContentNotFoundError
):

I have no idea what it means when it says "sometimes it will work just to ignore this error with --load-error-handling ignore". The file definitely exists and I've tried maybe 5 variations of the file path.


Solution

  • I've had to do something like this several times. In all cases, I ended up writing a background job that does all the heavy lifting generation. And because it's not a web request, it's not affected by the 30 seconds timeout. It goes something like this:

    1. client (your javascript code) requests a new report.
    2. server generates job description and enqueues it for your worker to pick up.
    3. worker picks the job from the queue and starts working (querying database, etc.)
    4. in the meanwhile, client periodically asks the server "is my report done yet?". Server responds with "not yet, try again later"
    5. worker is finished generating the report. It uploads the file to some storage (S3, for example), sets job status to "completed" and job result to the download link for the uploaded report file.
    6. server, seeing that job is completed, can now respond to client status update requests "yes, it's done now. Here's the url. Have a good day."
    7. Everybody's happy. And nobody had to do any streaming or playing with heroku's rolling response timeouts.

    The scenario above uses short-polling. I find it the easiest to implement. But it is, of course, a bit wasteful with regard to resources. You can use long-polling or websockets or other fancy things.