I am using databricks repos
I have two files, My function in a file called func.py in another folder called folder1
def lower_events(df):
return df.withColumn("event",f.lower(f.col("event")))
My main notebook in which I am calling the lower_events
import pyspark.sql.functions as f
from pyspark.sql.functions import udf, col, lower
import sys
sys.path.append("..")
from folder1 import func
df_clean = func.lower_events(df)
This returns an error
NameError: name 'f' is not defined
But this is working
def lower_events(df):
import pyspark.sql.functions as f
from pyspark.sql.functions import col, when
return df.withColumn("event",f.lower(f.col("event")))
The error is correct as each individual Python module has its own imports and doesn't refer to the imports done in the main module or other modules (see Python docs for more details).
So your func.py
should contain imports somewhere - not necessary in the function itself, it could be in the top-level of the file:
import pyspark.sql.functions as f
from pyspark.sql.functions import col, when
def lower_events(df):
return df.withColumn("event",f.lower(f.col("event")))
P.S. You also may not need sys.path.append("..")
- Databricks Repos will automatically add root of the repository to the sys.path
.