Search code examples
javaspringspring-batchimpalaapache-kudu

How to test spring batch step which reads from database and writes into a file?


I would like to know what would be the best approach to test the below scenario in a Spring Batch job:

  • A job consisting of two steps:

1) The first step reads from a database using an ItemReader (from apache kudu using impala) and writes into a file the content generated by the query.

  • That itemReader has a rowMapper which creates a complex object from the resultset. Its itemWriter just makes a toString (which in fact is a JSON representation) of that complex object.

2) The second step reads from the file generated by the step 1 and processes it. After processing all file, everything is written into a new file.

  • The itemReader reads the file from step 1 using a jsonLineMapper, then processes the new complex objects generated from mapper and writes them to a new file.

Then the job's listener uploads into S3 both files.

I need this workflow because the first step generates the sample needed for the second step. And if someday I need to test only the second step I can use an old sample from the first step as database varies along the time and without it, I maybe could not generate the same sample of the execution of two days before.

The first step is the hardest one to test, but I would like to test both steps in a way like the following:

1) From step 1 I need to check that the query syntax is correct. Also, check that from database resultset it generates correct objects via the rowMapper. The content of the file of itemWriter is correct (correct means that is expected).

2) That second step is easier to test, as I could start with a predefined file. It should test that reading from the file using the jsonLineMapper is done correctly. The processing part is tested apart, but I could follow one simple workflow, and the final file has the expected content.

My idea for testing that scenario was:

1) In order to check that the query syntax is correct, I need a query builder (I googled and I found libraries like jOOQ but I don't want to add an external library just for building a string query). After checking that the query is correct, maybe I should mock the database and return a predefined complex object and write it into the file. The problem is that if the query is returning a missing column, the object would not be correct, and the test should fail, so if I return a predefined object I would never know which is the query return.

As you can see here the problem radicates in to validate the query, as if the query is correct, I can test the rowMapper and the final file.

2) For this step, I thought that the best approach would be to have a predefined file with a correct content from step 1 and just check that the final file content is what I expect. I think it is easy to test that step.

Any better way or approach for testing this scenario?

Thanks!


Solution

  • For step 1, I would recommend using an embedded database to insert some rows, run your job and then assert the generated file is correct. This allows you to have control over test data in order to validate your query and the expected result in the file. You can find an example here: https://docs.spring.io/spring-batch/4.0.x/reference/html/testing.html#endToEndTesting. Spring Batch provides the AssertFile.assertFileEquals to test if two files are equal. This can help you validating the output of step 1 against an expected file.

    For step 2, you can create some valid/invalid files (those can be the result of step 1) and use them as input to test step 2. The caveat though is that if the result of step 1 changes, those files will not be valid anymore to test step 2 (so this is maintenance cost that you need to be aware of).