Search code examples
scalaapache-sparkapache-spark-sqlazure-databricks

Map monthname to monthnumber in Azure databricks using scala


Need to map monthname to monthnumber in Azure databricks using scala. I have column name PERIOD which have data like months name(like Jan,Feb,Mar,....,Nov,Dec), I want to replace this monthname with monthnumber (like 01,02,03,...,11,12). Result should be like Jan,Feb,Mar,..,Nov,Dec replaced with 01,02,03,...,11,12.

      "Jan"  -> 01,
      "Feb"  -> 02,
      "Mar"  -> 03,
      "Apr"  -> 04,
      "May"  -> 05,
      "Jun"  -> 06,
      "Jul"  -> 07,
      "Aug"  -> 08,
      "Sep"  -> 09,
      "Oct"  -> 10,
      "Nov"  -> 11,
      "Dec"  -> 12

I'm new to scala and azure databricks. I tried mapping approach but not getting desired solution. enter image description here


Solution

  • I have a dataframe with column called month_in_words as in the following image:

    enter image description here

    • Now to convert this month in words to month number, first I have created a map to where key is the month name and value is month number as string (Since we need month number as 01, but not 1).
    val monthNumber = Map(
          "Jan" -> "01",
          "Feb" -> "02",
          "Mar" -> "03",
          "Apr" -> "04",
          "May" -> "05",
          "Jun" -> "06",
          "Jul" -> "07",
          "Aug" -> "08",
          "Sep" -> "09",
          "Oct" -> "10",
          "Nov" -> "11",
          "Dec" -> "12"
      )
    
    • Now you can use the following code to convert month name to month number:
    var final_df = df.rdd.map(f => {
        val number = monthNumber(f.getString(0))
        (f.getString(0),number)
    }).toDF("month_in_words","month_in_number")
    //display(df)
    

    enter image description here

    • To directly convert, you can use the following code:
    var final_df = df.rdd.map(f => monthNumber(f.getString(0))).toDF("month_in_number")
    //display(final_df)
    

    enter image description here