Search code examples
javascriptfirebasetwittergoogle-cloud-functions

Running tasks in parallel with reliability in cloud functions


I'm streaming and processing tweets in Firebase Cloud Functions using the Twitter API.

In my stream, I am tracking various keywords and users of Twitter, hence the influx of tweets is very high and a new tweet is delivered even before I have processed the previous tweet, which leads to lapses as the new tweet sometimes does not get processed.

This is how my stream looks:

...
const stream = twitter.stream('statuses/filter', {track: [various, keywords, ..., ...], follow: [userId1, userId2, userId3, userId3, ..., ...]});

stream.on('tweet', (tweet) => {
   processTweet(tweet); //This takes time because there are multiple network requests involved and also sometimes recursively running functions depending on the tweets properties.
})
...

processTweet(tweet) essentially is compiling threads from twitter, which takes time depending upon the length of the thread. Sometimes a few seconds also. I have optimised processTweet(tweet) as much as possible to compile the threads reliably.

I want to run processTweet(tweet) parallelly and queue the tweets that are coming in at the time of processing so that it runs reliably as the twitter docs specify.

Ensure that your client is reading the stream fast enough. Typically you should not do any real processing work as you read the stream. Read the stream and hand the activity to another thread/process/data store to do your processing asynchronously.

Help would be very much appreciated.


Solution

  • This twitter streaming API will not work with Cloud Functions.

    Cloud Functions code can only be invoked in response to incoming events, and the code may only run for up to 9 minutes max (default 60 seconds). After that, the function code is forced to shut down. With Cloud Functions, there is no way to continually process some stream of data coming from an API.

    In order to use this API, you will need to use some other compute product that allows you to run code indefinitely on a dedicated server instance, such as App Engine or Compute Engine.