Search code examples
google-bigquerytime-seriesfinance

Google BigQuery aggregate OHLC data over time window


There is time series trading transactions history stored with google's BigQuery.

# Transaction history scheme

exchange_id INTEGER REQUIRED  
from_id INTEGER REQUIRED    
to_id   INTEGER REQUIRED    
price   FLOAT   REQUIRED    
size    FLOAT   REQUIRED    
ts  TIMESTAMP   REQUIRED    
is_sell BOOLEAN NULLABLE    
_PARTITIONTIME  TIMESTAMP   NULLABLE    

exchange_id - platform where transation occured from_id - base symbol
to_id - quote symbol
price - trade price
size - trade quantity

I need to aggregate OHLC data over 30 seconds time interval grouped by
exchange_id, from_id, to_id. How can I do this in the BigQuery?

# Required OHLC aggregated data scheme

ts  TIMESTAMP   REQUIRED 
exchange_id INTEGER REQUIRED  
from_id INTEGER REQUIRED    
to_id   INTEGER REQUIRED    
open   FLOAT   REQUIRED    
high   FLOAT   REQUIRED    
low   FLOAT   REQUIRED    
close   FLOAT   REQUIRED    
volume    FLOAT   REQUIRED 
_PARTITIONTIME  TIMESTAMP   NULLABLE       

open - first price in interval
high - highest price..
low - lowest price..
close - last price..
volume - SUM of all trade size's in current interval


Most promising ideas were:

SELECT 
    TIMESTAMP_SECONDS(
      UNIX_SECONDS(ts) -
      60 * 1000000
    ) AS time,
    exchange_id,
    from_id,
    to_id,
    MIN(price) as low,
    MAX(price) as high,
    SUM(size) as volume
FROM 
    `table`
GROUP BY
  time, exchange_id, from_id, to_id
ORDER BY
  time

And this one:

SELECT 
  exchange_id,from_id,to_id,
  MAX(price) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as high,
  MIN(price) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as low,
  SUM(size) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as volume,
FROM   [table];


# returns:
1   1   4445    3808    9.0E-8  9.0E-8  300000.0     
2   1   4445    3808    9.0E-8  9.0E-8  300000.0     
3   1   4445    3808    9.0E-8  9.0E-8  300000.0     
...
14  1   4445    3808    9.0E-8  9.0E-8  865939.3721800799    
15  1   4445    3808    9.0E-8  9.0E-8  865939.3721800799    
16  1   4445    3808    9.0E-8  9.0E-8  865939.3721800799    

But nothing of this works. It seems that I missing something important about sliding window in BigQuery.


Solution

  • Below is for BigQuery Standard SQL

    #standardsql
    SELECT 
      exchange_id, 
      from_id, 
      to_id,
      TIMESTAMP_SECONDS(DIV(UNIX_SECONDS(ts), 30) * 30) time,
      ARRAY_AGG(price ORDER BY ts LIMIT 1)[SAFE_OFFSET(0)] open,
      MAX(price) high,
      MIN(price) low,
      ARRAY_AGG(price ORDER BY ts DESC LIMIT 1)[SAFE_OFFSET(0)] close,
      SUM(size) volume
    FROM `yourproject.yourdataset.yourtable`
    GROUP BY 1, 2, 3, 4