Search code examples
ruby-on-railsherokuamazon-s3temporary-files

Writing temp files to Heroku from an S3 hosted file


I have a rails app hosted on Heroku. Here's the situation: a user should be able to upload a PDF (an instance of Batch) to our app using s3; a user should also be able to take the s3 web address of the uploaded PDF and split it up into more PDFs using HyPDF by specifying the file path and the desired pages to be split out (to create instances of Essay).

All of this is happening in the same POST request to /essays.

Here's the code I've been working with today:

 def create
    if params[:essay].class == String 
      batch_id = params[:batch_id].gsub(/[^\d]/, '').to_i
      break_up_batch(params, batch_id)
      redirect_to Batch.find(batch_id), notice: 'Essays were successfully created.'
    else 
      @essay = Essay.new(essay_params)
      respond_to do |format|
        if @essay.save
          format.html { redirect_to @essay, notice: 'Essay was successfully created.' }
          format.json { render :show, status: :created, location: @essay }
        else
          format.html { render :new }
          format.json { render json: @essay.errors, status: :unprocessable_entity }
        end
      end
    end
 end

# this is a private method
def break_up_batch(params, batch_id)
  essay_data = []
  # create a seperate essay for each grouped essay
  local_batch = File.open(Rails.root.join('tmp').to_s + "temppdf.pdf" , 'wb') do |f|
    f.binmode
    f.write HTTParty.get(Batch.find(batch_id).document.url).parsed_response
    f.path
  end

  params["essay"].split("~").each do |data|
    data = data.split(" ")
    hypdf_url = HyPDF.pdfextract(
        local_batch,
        first_page: data[1].to_i, 
        last_page: data[2].to_i,
        bucket: 'essay101',
        public: true

    )
      object = {student_name: data[0], batch_id: batch_id, url: hypdf_url[:url]}
      essay_data << object 
  end

  essay_data.each {|essay| Essay.create(essay)}
  File.delete(local_batch) 
end

I can't get the file to show up on Heroku, and I'm checking with heroku run bash and ls tmp. So when the method is run, a blank file is uploaded to S3. I've written some jQuery to populate a hidden field which is why there's the funky splitting in the middle of the code.


Solution

  • Turns out using the File class wasn't the right way to go about it. But using Tempfile works!

    def break_up_batch(params, batch_id, current_user)
          essay_data = []
          # create a seperate essay for each grouped essay
          tempfile = Tempfile.new(['temppdf', '.pdf'], Rails.root.join('tmp'))
          tempfile.binmode
          tempfile.write HTTParty.get(Batch.find(batch_id).document.url).parsed_response
          tempfile.close
          save_path = tempfile.path
    
          params["essay"].split("~").each do |data|
            data = data.split(" ")
            hypdf_url = HyPDF.pdfextract(
                save_path,
                first_page: data[1].to_i, 
                last_page: data[2].to_i,
                bucket: 'essay101',
                public: true
    
            )
              object = {student_name: data[0], batch_id: batch_id, url: hypdf_url[:url]}
              essay_data << object 
          end
          essay_data.each do |essay| 
            saved_essay = Essay.create(essay)
            saved_essay.update_attributes(:company_id => current_user.company_id) if current_user.company_id
          end
          tempfile.unlink
        end