Search code examples
ruby-on-railsamazon-s3initializer

Create objects in database as Rails server begins


I'm working with a Rails application that is currently creating classes before and after the User class has been created or saved. The problem I'm experiencing with this path is that one class is reading from a very large S3 AWS bucket, there are over 7,000 objects I need to add to our database. This whole process takes about 32ms to create objects and 2911ms to add these objects to the Batch database. The reason why I am adding this to our database instead of simply reading the AWS bucket is 1) to add properties to these objects 2) for the objects to be available for the iPhone application

I'd love to figure out a way for this bucket to be read and created into a database before the Rails application finishes loading or run the code in the background.

Here is my Batch.rb code:

class Batch < ActiveRecord::Base
  serialize :folder, JSON
  has_many :tops
  has_many :bottoms

  def access_bucket
    AwsAccess.new('curateanalytics', [], "", {}).sort_through_bucket
  end
  class AwsAccess
    def initialize(bucket_name, array, current, obj)
      @bucket_name = bucket_name
      @array = array
      @current = current
      @obj = obj
      @newfolder
      @newbatch
      @newurl
    end

    def access_bucket
      return AWS::S3.new.buckets[@bucket_name]
    end

    def sort_through_bucket
      access_bucket.objects.each do |obj|
        if obj_is_swipe_batch?(obj)
          create_new_instances(obj)
          if !obj_contains_key?
            add_newfolder_key
          end
          if !current_equals_batch?
            @current = @newbatch
            if array_not_array?
              @obj[@newfolder] << @array
            end
          end
          if Properties.new(@newurl).find_properties[:main_category] == "Bottoms"
            @bottom = Bottoms.create({:batch_folder => @newfolder, :batch_number => @newbatch , :file_name => @newurl.split("/").last.gsub("%26","&"), :url => @newurl, :properties => Properties.new(@newurl).find_properties})
          end
          if Properties.new(@newurl).find_properties[:main_category] == "Tops"
            @top = Tops.create({:batch_folder => @newfolder, :batch_number => @newbatch, :file_name => @newurl.split("/").last.gsub("%26","&"), :url => @newurl, :properties => Properties.new(@newurl).find_properties})
          end
        end
      end
    end

    def obj_is_swipe_batch?(obj)
      return ((obj.key =~ /swipe batches/) && (obj.key =~ /jpg/))
    end

    def create_new_instances(obj)
      @newfolder = obj.key.split("/")[1]
      @newbatch = obj.key.split("/")[obj.key.split("/").length-2]
      @newurl = "https://s3.amazonaws.com/curateanalytics/" + obj.key.gsub('&', '%26').gsub('swipe ', 'swipe+')
    end

    def obj_contains_key?
      @obj.key?(@newfolder)
    end

    def add_newfolder_key
      @obj.merge!(@newfolder => [])
    end

    def current_equals_batch?
      @current == @newbatch
    end

    def array_not_array?
      @array != []
    end
  end

  class Properties
    def initialize(bucket_url)
      @bucket_url = bucket_url
      @hash = {}
    end

    def find_properties
      read_json
      parse_json
      parse_main
      parse_sub
      return @hash
    end

    def read_json
      @json = JSON.parse(File.read(File.join(Rails.root, 'public', 'DatabaseArray.json')))
    end

    def parse_json
      @json.each do |main|
        @main = main
      end
    end

    def parse_main
      @main.each do |sub|
        @sub = sub
      end
    end

    def parse_sub
      @sub.gsub("\"","")[1..-2].split(",").each do |properties|
        @property = properties.split(":")
        is_everything
      end
    end

    def is_URL?
      @property.first == "URL"
    end

    def is_File_Name?
      @property.first == "File_Name"
    end

    def is_Main?
      @property.second == "{Main_Category"
    end

    def is_everything
      if !is_URL? && !is_File_Name? && is_Main?
        hash_merge(@property.second.gsub!("{",""),@property.last)
      elsif !is_URL? && !is_File_Name?
        hash_merge(@property.first,@property.last)
      end
    end

    def hash_merge(name, property)
      @hash.merge!(name.parameterize.underscore.to_sym => property)
    end
  end
end

So far I've looked into putting this code as an initializer. I can access the /config/initializer/batch.rb file, it looks exactly the same as this batch.rb file using binding.pry but the code never runs.


Solution

  • I solved this problem by creating a custom rake task. I came to this conclusion from this post's second answer; after researching rakes and threads I decided running a rake task beforehand to populate my databases would work best as the S3 bucket I'm reading from will not change unless I allow it. For anyone looking into this question I needed to write

    task :task_name => :environment do
      DB0.connection
      DB1.connection
      ...code...
    end
    

    and rewrite my code without methods.

    Although running the rake file takes a short while my user sign-in on the website runs so much quicker.