Search code examples
ruby-on-railsthread-safetymutexpuma

Rails Instance Variable Conflict between Requests


I have an array of prices: @price_queue.

It is persisted in PostgreSQL as Prices.find(1).price_list and is seeded.

When a transaction is initiated, the transaction takes the next price in @price_queue, and gets sent to a payment processor for the charge.

def create

  price_bucket = Prices.find(1)

  price_bucket.with_lock do                          
    @price_queue = price_bucket.price_list         
    @price = @price_queue.shift                    
    price_bucket.save                              
  end

      customer = Stripe::Customer.create(
          :email        => params[:stripeEmail],
          :card         => params[:stripeToken],
      )
      charge = Stripe::Charge.create(
          :customer     => customer.id,
          :amount       => @price * 100,
      )

      if charge["paid"]
        Pusher['price_update'].trigger('live', {message: @price_queue[0]})
      end

If the transaction succeeds, it should get rid of the @price it holds. If it fails, the price should be placed back into @price_queue.

  rescue Stripe::CardError => e
    flash[:error] = e.message
    @price_queue.unshift(@price)
    Pusher['price_update'].trigger('live', {message: @price_queue[0]})
    price_bucket.price_list = @price_queue
    price_bucket.save
    redirect_to :back
  end

I have found a major bug when testing, at milliseconds intervals, two failing transactions and then a passing one.

price_queue = [100, 101, 102, 103, ...]

User 1 gets 100 (confirmed on Stripe dashboard)

User 2 gets 101 (confirmed on Stripe dashboard)

User 3 gets 102 (confirmed on Stripe dashboard)

Expected:

Assuming no unshift has occurred yet

price_queue = [103, 104, ...]

User 1 fails, puts 100 back

price_queue = [100, 103, ...]

User 2 fails, puts 101 back

price_queue = [101, 100, 103, ...]

User 3 passes, 102 disappears

What really happens:

price_queue = [101, 102, 103, 104, ...]

As we can see, 100 is disappearing, although it should be back in the queue, 101 is back in the queue (most probably not by expected behaviour), and 102 is being put back in the queue, even though it shouldn't even traverse the rescue path.

I am using Puma on Heroku.

I have tried storing the price in session[:price], cookie[:price], assigning it to a local variable price, to no avail.

I've been reading around and figured this could be a scope problem cause by a multithreaded environment, where @price is leaking to other controller actions and being reassigned or mutated.

Any help would be greatly appreciated. (also feel free to critic my code)


Solution

  • This is nothing to do with instance variables leaking or anything like that - just some classic race conditions going on here. Two possible timelines:

    • request 1 fetches prices from the database (the array is [100,101,102])
    • request 2 fetches prices from the database (the array is [100,101,102] - a separate copy)
    • request 1 locks prices, removes a price and saves
    • request 2 locks prices, removes a price and saves

    The important thing here is that request 2 is using an old copy of prices that doesn't include the changes made by request 1: Both of the instances will shift the same value off the array (the requests could be the different threads on the same puma worker, different workers or even different dynos - doesn't matter)

    Another failure scenario would be

    • request 1 fetches prices, removes a price and saves. The array in the database is [101,102,103,...], the in memory array is [101,102,103, ...]
    • request 2 fetches prices, removes a price and saves. The array in the database is [102,103,...]
    • request 2's stripe transaction succeeds
    • request 1's stripe transaction fails, so it puts 100 back onto the array and saves. Because you haven't reloaded from the database this overwrites the changes from request 2.

    To fix this properly I'd split out the logic for acquiring and replacing a price into their own methods - something like

    class Prices < ActiveRecord::Base
    
      def with_locked_row(id)
        transaction do
          row = lock.find(id)
          result = yield row
          row.save #on older versions of active record you need to tell rails about in place changes
          result
        end
    
      def self.get_price(id)
        with_locked_row(id) {|row| row.pricelist.shift}
      end
    
      def self.restore_price(id, value)
        with_locked_row(id) {|row| row.pricelist.unshift(value}
      end
    end
    

    You'd then do

      Prices.get_price(1) # get a price
      Prices.restore_price(1, some_value) #put a price back if the charging failed
    

    The key differences between this and your original code are that:

    • I acquire a locked row and update it, rather than acquiring a row and then locking it. This eliminates the window in the first scenario I outlined
    • I don't keep using the same active record object after I've released the lock - as soon as the lock has been released someone else might be changing it behind your back.

    You could also do this via optimistic locking (i.e without explicit locks). The only thing that would change code-wise would be

        def with_locked_row(id)
          begin
            row = lock.find(id)
            result = yield row
            row.save #on older versions of active record you need to tell rails about in place changes
            result
          rescue ActiveRecord::StaleObjectError
            retry
          end
        end
    

    You would need to add a non null integer column with default value 0 called lock_version for this to work.

    Which performs better depends on how much concurrency you experience, what other accesses there are to this table and so on. Personally I would default to optimistic locking unless I had a compelling reason to do otherwise.