Search code examples
javatddintegration-testingamazon-redshiftdbunit

Redshift integration tests (or other SQL DB). Verify table contents using CSV files in Java


I want to create integration test that checks any DB table contents without knowing the table structure. Some sort of testing framework, I put a set of CSV files with names like table_to_test.csv, the test picks those files and asserts equality with DB contents.

By documentation this is achievable using DBUnit

    IDataSet databaseDataSet = dbUnitConnection.createDataSet();
    ITable actualTable = databaseDataSet.getTable(tableName);

    // Load expected data from an CSV dataset
    IDataSet expectedDataSet = new CsvURLDataSet(resource.getURL());
    ITable expectedTable = expectedDataSet.getTable(tableName);

    // Assert actual database table match expected table
    org.dbunit.Assertion.assertEquals(expectedTable, actualTable);

Although this is exactly what I need, this solution appeared undocumented, inflexible and therefore possibly unreliable. I also was not able to quickly find DBUnit bug tracker.

I wonder if there is alternative solution. CSV to CSV comparison frameworks might also work, since on Redshift I can unload data from tables to CSV files.


Solution

  • Eventually I was able to use DB Unit. If you are using Spring you can setup DB connection in this way (@BeforeEach is from JUnit5)

    @Autowired
    private DataSource dataSource;
    private DatabaseDataSourceConnection dbUnitConnection;
    
    @BeforeEach
    void beforeEach() throws SQLException {
        dbUnitConnection = new DatabaseDataSourceConnection(dataSource, schema);
    }
    

    For assertions you will need the code in the original question. You also will need to specify all the table names you are going to test in table-ordering.txt (the resource instance should point to this file). The file must be placed to the same directory where all the CSV files sit. Example of table-ordering.txt:

    my_table_1
    my_table_2
    my_table_3
    

    CSV files with expected data must have names my_table_1.csv, my_table_2.csv and my_table_3.csv respectively. If you need to assert null values you must explicitly specify the word "null" in CSV. Also watch out for data in your CSVs. In my case I had one extra blank symbol in the headline. In the text editor it was invisible, however the test was failing, and it was pretty hard to debug.

    The CSV escape symbol is '\' .