Search code examples
google-cloud-dataflowapache-beamdataflow

How to do this type of testing in Dataflow(called feature testing at twitter)?


We do something called feature testing like so -> https://blog.twitter.com/engineering/en_us/topics/insights/2017/the-testing-renaissance.html

TLDR of that article, we send request to microservice(REST POST with body), mock GCP Storage, mock downstream api call so the entire microservice can be refactored. Also, we can swap out our platforms/libs with no changes in our testing which makes us extremely agile.

My first questions is can DataFlow (apache beam) receive a REST request to trigger the job? I see much of the api is around 'create job' but I don't see 'execute job' in the docs while I do see get status returns the status of job execution. I just don't see a way to trigger a job to

  • read from my storage api (which is mockable and sits in front of GCP)
  • process the file hopefully across many nodes
  • call the apis downstream (which is also mockable)

Then, I simply want to in my test simulate the http call, then when file is read, return a real customer file and then after done, my test will verify all the correct requests were sent to the apis downstream.

We are using apache beam in our feature tests though not sure if it's the same version as google's dataflow :( as that would be the most ideal!!! -> hmmm, is there a reported apache beam version of google's dataflow we can get?

thanks, Dean

thanks, Dean


Solution

  • Apache Beam's DirectRunner should be very close to Dataflow's environment, and it's what we recommend for this type of single-process pipeline test.

    My advise would be the same: Use the DirectRunner for your feature tests.

    You can also use the Dataflow runner, but that sounds like it would be a full integration test. Depending on the data source / data sink, you may be able to pass it mocking utilities.

    BigQueryIO is a good example. It has a withTestServices method that you can use to pass objects that mock the behavior of external services