I have a dataframe and it has string values where i have an array.
alg_mappings = {
('Full Cover', 40): [['base,permitted_usage'],['si_mv'],['suburb']]# Add more values as needed
}
default_value = None
def get_alg_value(sub_class, version_number):
return alg_mappings.get((sub_class, version_number), default_value)
get_alg_value_udf = F.udf(get_alg_value)
df_with_alg = df.withColumn("alg", get_alg_value_udf(F.col("sub_class"), F.col("version")))
alg column is a string, but i want it to be an array element with the exact format of
[['base,permitted_usage'],['si_mv'],['suburb']]
I will be adding more elements to it, so it could even be size of 25 ++. Hence, need the most efficient way to convert it into an array. Will be adding more keys as well.
I suggest you to use a decorator to specify the output data type on the UDF. Default is string so you get a string representation of the output.
Output as a list of strings
@udf(ArrayType(StringType()))
def get_alg_value(sub_class, version_number):
return alg_mappings.get((sub_class, version_number), default_value)
Output as a list of lists of strings
@udf(ArrayType(ArrayType(StringType())))
def get_alg_value(sub_class, version_number):
return alg_mappings.get((sub_class, version_number), default_value)