Search code examples
apache-sparkpysparkspark-structured-streaming

PySpark Structured Streaming: Pass output of Query to API endpoint


I have the following dataframe in Structured Streaming:

TimeStamp|Room|Temperature|
00:01:29 | 1  | 55        | 
00:01:34 | 2  | 51        | 
00:01:36 | 1  | 56        | 
00:02:03 | 2  | 49        | 

I am trying to detect when temperatures fall below a certain temperature (50 in this case). I have that part of the query working. Now, I need to pass this information to an API endpoint via a POST call like this: '/api/lowTemperature/' with the timestamp and the temperature in the body of the request. So, in the above case, I need to send along:

POST /api/lowTemperature/2
BODY: { "TimeStamp":"00:02:03",
       "Temperature":"49" }

Any idea how I can achieve this using PySpark?

One way I thought of doing this was using Custom streaming sink, but, I can't seem to find any documentation on achieving this using Python.


Solution

  • Good news, as support for Python has recently been added for the ForeachWriter. I created one in Python for REST and Azure Event Grid and it's rather straightforward. The (basic) documentation, can be found here: https://docs.databricks.com/spark/latest/structured-streaming/foreach.html#using-python