Mypy throws the error "Value of type 'Row | None' is not indexable" for the line starting with "x=":
from pyspark.sql import DataFrame
from pyspark.sql import functions as f
def somefunction(df: DataFrame, column_name: str) -> DataFrame:
x = df.select(f.min(f.col(column_name))).first()[0]
return df.withColumn('newcolumn', f.col(column_name) + x)
How can I add a type check that passes mypy?
df.select(...)
returns a value of type Row | None
, which means the actual return value might be a Row
, or it might be None
. You can't index None
, so you can't index a value of type Row | None
until you establish that it definitely isn't None
. (Essentially, the interface of a union type is the intersection of the individual types: you can only do with A | B
what you can do to A
and B
.)
One way to do that is to use type narrowing: by checking if the return value is None
, you can branch into code where the static type is NoneType
, or into code where the static type is Row
.
# reveal_type(result) == Row | None
result = df.select(f.min(f.col(column_name))).first()
if result is None:
# reveal_type(result) == None
do what needs to be done if no row is returned
else:
# reveal_type(result) == Row
x = result[0]
return df.withColumn('new column', f.col(column_name) + x)
You might know that the particular argument to df.select
cannot fail, but mypy
does not.
Another option, if you are absolutely sure you will get a Row
back, is to use cast
to let mypy
in on the secret.
x = typing.cast(Row, df.select(...))[0]
return df.withColumn(...)
This is risky, though. mypy
will believe the cast
, and if df.select(...)
does return None
, you'll get a runtime error even though mypy
says it's OK.