What is the easiest way of asserting specific cell values in pyspark dataframes?
+---------+--------+
|firstname|lastname|
+---------+--------+
|James |Smith |
|Anna | null |
|Julia |Williams|
|Maria |Jones |
|Jen |Brown |
|Mike |Williams|
+---------+--------+
I want to assert the existence of values null and "Jen" in their respective rows/columns in this data frame.
So I can use something like:
assert df['firstname'][4] == "Jen"
assert df['lastname'][1] == None
From what I found, using collect()
is the way (which is equivalent of iloc() in Pandas df):
assert df.collect()[4]['firstname'] == 'Jen'
assert df.collect()[1]['lastname'] is None