Search code examples
pythondataframepysparkassert

Assert a value of a specific cell in spark df in python


What is the easiest way of asserting specific cell values in pyspark dataframes?

+---------+--------+
|firstname|lastname|
+---------+--------+
|James    |Smith   |
|Anna     | null   |
|Julia    |Williams|
|Maria    |Jones   |
|Jen      |Brown   |
|Mike     |Williams|
+---------+--------+

I want to assert the existence of values null and "Jen" in their respective rows/columns in this data frame.

So I can use something like:

assert df['firstname'][4] == "Jen"
assert df['lastname'][1] == None

Solution

  • From what I found, using collect() is the way (which is equivalent of iloc() in Pandas df):

    assert df.collect()[4]['firstname'] == 'Jen'
    assert df.collect()[1]['lastname'] is None