Search code examples
c#azuretwitterstreamstreaminsight

Twitter stream routing using a 'bus' or StreamInsight


I'm stuck on an architectural question regarding the following:

Edit:

So I might be over thinking the problem, or I might rephrase the question. NServiceBus seems to be made for Messaging and Routing (of stream-like data?), whereas StreamInsight seems to be made for Event Stream Processing, Event Querying and Correlating. :).

Are there any benefits (eg. in terms of scalability, redundancy) of using Approach 1 over Approach 2?

"Approach 1"

which is a bus (e.g. NServiceBus) to get data into the database and use StreamInsight solely for querying/correlating.

"Approach 2"

which doesn't use NServiceBus but instead leverages Input/Output adapters as Pub/Sub whereas the Sub is the Output adapter which 'actively pushes the data into the Database'?

enter image description here

Original:

We are creating an application where Twitter data is streamed into our environment. This data is:

  1. Stored as raw (event) input data
  2. Parsed/filtered
  3. Queried (using StreamInsight CEP)
  4. Remaining data after previous steps is stored as complex event

For step 1 I'm not sure to what the most desired approach is:

  1. Use StreamInsight to split the datastream in two where an output adapter stores raw data in a database on one side and where another output adapter sends the data for further parsing/filtering (step 2) to another input adapter.

-or-

  1. Use a different technology (MSMQ? Azure Service Bus?) for 'routing the raw data stream to the database'

Any guidance is greatly appreciated!


Solution

  • The volume that you are talking about isn't much for StreamInsight. Not that it's a problem. Second, there's no reason to add complexity into it and you seem to be overthinking the problem. First, using StreamInsight 2.1, it's easy to create a sink that sends some data to a the database then then having additional queries that do additional analytics. This would occur in a single "Process" (not to be confused with a Windows process) and any set of queries can have different sinks for output. Make sense? If you want to see an example, you can download this demo: http://1drv.ms/1nPs2cA. Also, look at my blog at www.devbiker.net.