Search code examples
apache-sparkpysparkpalantir-foundryfoundry-code-repositoriesfoundry-code-workbooks

How to use when and Otherwise statement for a Spark dataframe by boolean columns?


I have a dataset with three columns, col 1: country (String), col 2: threshold_1 (bool), col 3: threshold_2 (bool)

I am trying to create a new column with this logic, but getting an error

I am using the Palantir code workbook for this, can anyone tell me what I am missing here?

df = df.withColumn("Threshold_Filter", 
        when(df["country"]=="INDIA" & df["threshold_1"]==True | df["threshold_2 "]==True, "Ind_country"
     ).otherwise("Dif_country"))

Solution

  • You just need to put your statements in parentheses.

    df = (
        df
        .withColumn(
            "Threshold_Filter",
            when(
                (df["country"]=="INDIA") & 
                (df["threshold_1"]==True) | 
                (df["threshold_2 "]==True), 
                "Ind_country")
            .otherwise("Dif_country"))
    )