Search code examples
scalaapache-spark

Capitalize the first letter of each word | Spark Scala


I have a table as below -

ID City Country
1 Frankfurt am main Germany

The dataframe needs to be displayed by capitalizing the first letter of each word in the city i.e. output should look like this ->

ID City Country
1 Frankfurt Am Main Germany

The solution I worked with is as below ->

df.map(x => x.getString(1).trim().split(' ').map(_.capitalize).mkString(" ")).show()

This only provides the City column aliased as "value".

How can I get all the columns with the above-mentioned transformation implemented?


Solution

  • You can use initcap function Docu

    public static Column initcap(Column e)

    Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace.

    For example, "hello world" will become "Hello World".

    Parameters: e - (undocumented) Returns: (undocumented) Since: 1.5.0

    Sample code

    import org.apache.spark.sql.functions._
    
    val data = Seq(("1", "Frankfurt am main", "Germany"))
    val df = data.toDF("Id", "City", "Country")
    df.withColumn("City", initcap(col("City"))).show
    

    And the output is:

    +---+-----------------+-------+
    | Id|             City|Country|
    +---+-----------------+-------+
    |  1|Frankfurt Am Main|Germany|
    +---+-----------------+-------+
    

    Your sample code was returning only 1 column because that's exactly what you coded in your map. Take x(so your df), get from it column on index 1, do some transformations and return it.

    You could do what you wanted with map as you can see in other answers but output of your map needs to include all columns.

    Why in my answer i am not doing map? General rule is: when there is build in sql function use it instead of custom map/udf. Most of the time sql function will be better in terms of performance as it easier to optimize for Catalyst