I have this line of code:
df = df.withColumn("final_name", F.substring(F.col("name"), 1, F.length(F.col("name"))-15))
When I run it I get the error Column is not iterable (Something with length is causing the issue). However, when I use the equivalent code with F.expr(), it works. Why is that?
df = df.withColumn("final_name", F.expr("substring(name, 1, length(name)-15)"))
This is really more for my own education on why my original code doesn't work. Thanks for your help.
substring(str: ColumnOrName, pos: int, len: int)
function is for static (hardcoded int values).
Use substr(str: ColumnOrName, pos: ColumnOrName, len: Optional[ColumnOrName] = None)
if you want it to be calculated.
df = df.withColumn("final_name", F.substr(F.col("name"), 1, F.length(F.col("name"))-15))
Both functions are available under pyspark.sql.functions
as well as pyspark.sql.column.Column
.
So these do the same:
df.withColumn("final_name", df.name.substr(F.lit(1), F.length(df.name)-15))
df.withColumn("final_name", F.col("name").substr(F.lit(1), F.length(F.col("name"))-15))