I try to have a dynamic concat of fields, based on some configuration settings the goal is to have a new fields with merged values of 1 to n fields.
language = "JP;EN"
language = list(str(item) for item in language.split(";"))
no_langs = len(language)
# check if columns for multi-language exists
for lang in language:
doc_lang = "doctor.name_" + lang
if doc_lang not in case_df.columns:
case_df_final = AddColumn(case_df, doc_lang)
### combine translations of masterdata
case_df = case_df.withColumn(
"doctor",
F.concat(
F.col(("doctor.name_" + language[0])),
F.lit(" // "),
F.col(("doctor.name_" + language[1])),
),
)
What I would like to achieve is that the new column is dynamic depending of the amount of languages configured. E.g. If only one language is used the result would be like this.
case_df = case_df.withColumn(
"doctor",
F.col(("doctor.name_" + lang[0]))
)
For 2 languages or more it should pick all the languages based on the order in the list. Thanks for your help. I am using Spark 2.4. with Python 3
The expected output would be the following
Final working code is the following:
# check if columns for multi-language exists
for lang in language:
doc_lang = "doctor.name_" + lang
if doc_lang not in case_df.columns:
case_df = AddColumn(case_df, doc_lang)
doc_lang_new = doc_lang.replace(".", "_")
case_df = case_df.withColumnRenamed(doc_lang, doc_lang_new)
doc_fields = list(map(lambda k: "doctor_name_" + k, language))
case_df = case_df.withColumn("doctor", F.concat_ws(" // ", *doc_fields))
Thanks all for the help and hints.