PySpark: Why does using F.expr work but using PySpark API does not

I have this line of code:

df = df.withColumn("final_name", F.substring(F.col("name"), 1, F.length(F.col("name"))-15))

When I run it I get the error Column is not iterable (Something with length is causing the issue). However, when I use the equivalent code with F.expr(), it works. Why is that?

df = df.withColumn("final_name", F.expr("substring(name, 1, length(name)-15)"))

This is really more for my own education on why my original code doesn't work. Thanks for your help.

Solution

substring(str: ColumnOrName, pos: int, len: int) function is for static (hardcoded int values).

Use substr(str: ColumnOrName, pos: ColumnOrName, len: Optional[ColumnOrName] = None) if you want it to be calculated.

df = df.withColumn("final_name", F.substr(F.col("name"), 1, F.length(F.col("name"))-15))

Both functions are available under pyspark.sql.functions as well as pyspark.sql.column.Column.

So these do the same:

df.withColumn("final_name", df.name.substr(F.lit(1), F.length(df.name)-15))
df.withColumn("final_name", F.col("name").substr(F.lit(1), F.length(F.col("name"))-15))