Search code examples
partitioningcomplex-event-processingsiddhi

Duplicate Siddhi partitions or multiple queries in same partition


We are trying to determine what is recommended and what is more performant if we have multiple queries each relying on the same partition. Imagine we have data of just a name string and a value integer, we would like 2 separate queries for detecting patterns in the value for the same name. There are two ways to write this:

Option 1:

define InputStream(name string, value integer)
partition with (name of InputStream)
begin
    from every s1=InputStream[value == 1],
    s2=InputStream[value == 2]
    select s1.value as initial, s2.value as final
    insert into OutputStream1
end
partition with (name of InputStream)
begin
    from every s1=InputStream[value == 10],
    s2=InputStream[value == 11]
    select s1.value as initial, s2.value as final
    insert into OutputStream2
end

Option 2:

define InputStream(name string, value integer)
partition with (name of InputStream)
begin
    from every s1=InputStream[value == 1],
    s2=InputStream[value == 2]
    select s1.value as initial, s2.value as final
    insert into OutputStream1;

    from every s1=InputStream[value == 10],
    s2=InputStream[value == 11]
    select s1.value as initial, s2.value as final
    insert into OutputStream2
end

Option 1: It should generate a separate partition stream for each query, and be able to execute them in parallel, but it also has the overhead of generating 2 partition streams for the same name. Unless Siddhi is smart enough to realize the partition streams are identical and puts them in the same stream.

Option 2: The queries are in the same partition stream so I imagine it will execute each sequentially (unless Siddhi is smart enough to realize the queries don't depend on each other since there are no inner streams). But the bonus is that now only 1 partition stream needs to be generated.

Either option should work fine, but which one is more performant? Or will they both be functionally the same once Siddhi processes them.


Solution

  • Since you are using Siddhi 4, I would recommend using Option 2, since memory overhead is really high in Option 1.

    However, this issue is fixed in Siddhi 5, after upgrading you can use Option 1 for better performance