Search code examples
apache-flinkflink-sql

How to write a SQL for a calculation based on incremental window of batch table


My requirement is to calculate based on an incremental size window for a batch table.

For example, the first window has 1 row, the second window has 2 rows(including 1 row from the 1st window and a new row), then 3 rows in the 3rd window(including 2 rows from the 2nd window and a new row), and so on.

For example:

Source table:

datetime | productId | price |

3-1 | p1 | 10 |

3-2 | p1 | 20 |

3-3 | p1 | 30 |

3-4 | p1 | 40 |

Result table:

datetime | productId | average|

3-1 | p1 | 10/1 |

3-2 | p1 | (10+20)/2 |

3-3 | p1 | (10+20+30)/3 |

3-4 | p1 | (10+20+30+40)/4 |

I am trying to find a way to implement this requirement with Sql, to me seems the OVER action can do that but not yet implemented in flink, so I need an alternative way.

BTW:

I tried to use a TUMBLE window of 1 day and store the previous value in the user defined aggregation object but failed as the aggregation object will be reused by all product not a single object for each product


Solution

  • The OVER clause on a batch table is not supported by Flink's SQL yet. You can track the status of this effort here.

    However, did you consider to implement this behavior on a streaming table instead? Streaming tables can also read from static files such as CSV files and many operations are supported there as well. This depends on the other operations you want to use in your query, though.