Long-running Redshift transaction from Ruby

I run several sql statements in a transaction using Ruby pg gem. The problem that I bumped in is that connection times out on these queries due to firewall setup. Solution proposed here does not work, because it requires jdbc connections string, and I'm in Ruby (jRuby is not an option). Moving driver program to AWS to remove firewall is not an option either.

The code that I have is along the following lines:

conn = RedshiftHelper.get_redshift_connection
begin
  conn.transaction do
    # run my queries
  end
ensure
  conn.flush
  conn.finish
end

I'm now looking into PG asynchronous API. I'm wondering if I can use is_busy to prevent firewall from timing out, or something to that effect. I can't find good documentation on the topic though. Appreciate any hints on that.

PS: I have solved this problem for a single query - I can trigger it asynchronously and track its completion using system STV_INFLIGHT Redshift table.Transaction does not work this way as I have to keep connection open.

Solution

Ok, I nailed it down. Here are the facts:

Redshift is based on Postgres 8.0. To check that, connect to Redshift instance using psql and see that it says "server version 8.0"
Keepalive requests are specified on the level of tcp socket (link).
Postgres 8.0 does not support keepalive option when specifying a connection string (link to 9.0 release changes, section E.19.3.9.1 on libpq)
PG gem in Ruby is a wrapper about libpq

Based on the facts above, tcp keepalive is not supported by Redshift. However, PG allows you to retrieve a socket that is used in the established connection. This means that even though libpq does not set keepalive feature, we still can use it manually. The solution thus:

 class Connection

      attr_accessor :socket, :pg_connection

      def initialize(conn, socket)
        @socket = socket
        @pg_connection = conn
      end

      def method_missing(m, *args, &block)  
        @pg_connection.send(m, *args, &block)
      end        

      def close
        @socket.close
        @pg_connection.close
      end

      def finish
        @socket.close
        @pg_connection.close
      end
    end     

def get_connection
      conn = PGconn.open(...)
      socket_descriptor = conn.socket
      socket = Socket.for_fd(socket_descriptor)
      # Use TCP keep-alive feature
      socket.setsockopt(Socket::SOL_SOCKET, Socket::SO_KEEPALIVE, 1)
      # Maximum keep-alive probes before asuming the connection is lost
      socket.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_KEEPCNT, 5)
      # Interval (in seconds) between keep-alive probes
      socket.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_KEEPINTVL, 2)
      # Maximum idle time (in seconds) before start sending keep-alive probes
      socket.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_KEEPIDLE, 2)
      socket.autoclose = true
      return Connection.new(conn, socket)
 end

The reason why I introduce a proxy Connection class is because of Ruby tendency to garbage-collect IO objects (like sockets) when they get out of scope. This means that we now need connection and socket to be in the same scope, which is achieved through this proxy class. My Ruby knowledge is not deep, so there may be a better way to handle the socket object.

This approach works, but I would be happy to learn if there are better/cleaner solutions.