Search code examples
ruby-on-railsrubypostgresqlquery-optimizationtimeoutexception

Optimize database query for time series data retrieving using postgresql and ruby on rails


Hello guys I'm working on a interesting real time application. The application is as follows.I have a meter model and meter_info model

calss Meter
  has_many :meter_infos
  # filed: id 
end

class MeterInfo
  belongs_to :meter
  # field: meter_id, voltage 
end

In every two minutes a new data is being saved to meter_info table.So you can imagine there are a huge data set over there.

Now what I want do is to find out exactly one voltage record of 10 meters each at a time in 10 minutes interval within 1 day. So the result would be something like this

id           created_at          meter_id      voltage
2001     2017-10-19 15:40:00        2             100
2001     2017-10-19 15:45:00        1             100
2001     2017-10-19 15:39:00        3             100
2001     2017-10-19 15:48:00        4             100
2001     2017-10-19 15:38:00        5             100
2001     2017-10-19 15:42:00        6             100
...
...

I've tried several queries but as it's taking too much time to find out the record, the request gets timeouted. Here is what I have tried for

(('2017-07-02 00:00:00').to_datetime.to_i .. 
  ('2017-07-02 23:59:59').to_datetime.to_i).step(10.minutes) do |date|
                query = "SELECT  created_at, meter_id, voltage
                FROM meter_infos
                WHERE created_at between  '#{Time.at(date).utc}' and 
                '#{Time.at(date).utc + 10.minutes}'
                AND meter_id in (1,2,3,4,5)
                ORDER BY id desc limit 1"

                voltages = ActiveRecord::Base.connection.execute(query)

end

Which is timeouted even in the development environment. Then I've tried to use Postgresql's generated_series like below

  query= "SELECT meter_id,voltage,  count(id) as ids
              , GENERATE_SERIES( timestamp without time zone '2017-10-19',
                  timestamp without time zone '2017-10-19',
                 '10 min') as time_range
              from meter_infos
              where meter_infos.created_at between '2017-10-19 00:00:01'::timestamp and  '2017-10-19 23:59:59'::timestamp
              and meter_infos.meter_id in (1,2,3,4,5)
                  GROUP BY meter_id, voltage
              ORDER BY meter_id ASC limit 1"

            sbps_plot = ActiveRecord::Base.connection.execute(query)

Which is faster but gives me wrong data. I am using Ruby on Rails and Postgresql. Can somebody help me to write the faster query to find out data against time or suggest me any procedure to handle time series data analysis. Thanks in advance.


Solution

  • You have records every two minutes, but you want to get a sample record from ten minute intervals. Here's my suggested solution:

    You can take the modulus of the epoch time of the created_at timestamp with 600 (ten minutes in seconds). Then compare this against some 'tolerance' value (e.g. 119 seconds or less) in case the timestamps of your records aren't aligned to perfect ten minute intervals. Think of it of retrieving the first record with a created_at inside a 2 minute window following each 10 minute interval of the day.

    For example,

    MeterInfo
      .where(
        meter_id: [1, 2, 3, 4, 5], 
        created_at: your_date.beginning_of_day..your_date.end_of_day
      )
      .where("(cast(extract(epoch from created_at) as integer) % 600) < 119")
    

    Give that a try and see if it works for you.