Search code examples
pythonpysparkparquet

Test Parquet with Python


I am trying to mock parquet and assert it's called with the correct path, but having trouble mocking it correctly. How can I mock the option function to return me a mocked parquet?

Code under test

def read_from_s3(spark, path):
    return spark.read.option('mergeSchema', 'true').parquet(path)

Test

import unittest
import mock
from src.read_from_s3 import read_from_s3


class TestReadFromS3(unittest.TestCase):
    def test_read_from_s3__called_with_correct_params(self):
        spark = mock.MagicMock()
        spark.read.option = mock.MagicMock()
        spark.read.option.parquet = mock.MagicMock()
        path = 'my_path'

        read_from_s3(spark, path)

        spark.read.option.assert_called_with('mergeSchema', 'true') # this passes
        spark.read.option.parquet.assert_called_with(path) # this fails

Test failure

AssertionError: Expected 'parquet' to have been called once. Called 0 times.

Solution

  • parquet is not an attribute of spark.read.option; it's an attribute of the return value of option. Further, you don't need to create the mocks explicitly; attribute lookup on a mock returns a mock as well.

    def test_read_from_s3__called_with_correct_params(self):
        spark = mock.MagicMock()
        read_from_s3(spark, path)
        spark.read.option.assert_called_with('mergeSchema', 'true')
        spark.read.option.return_value.parquet.assert_called_with(path)