Search code examples
scalaapache-sparkapache-spark-sql

Create new column from datetime field in spark scala 2.11.11


I imported a local csv file into my spark scala project. I have to calculate some statistics from the data. One of the calculations involve grouping the data by the date field and counting the occurrences by id. The second calculation involves showing the top 100 names in the data that appear frequently.

Any help would be appreciated on these items.

For the first one, I am able to create the "Month" column, but null values are there instead of just the numeric month so that I can group by the new "Month" column. Can someone help with this?

TestData_csv.withColumn("Month",date_format(to_date(col("date")), "MM")).show(false)

Solution

  • You were almost there!

    The problem in your case was that the to_date function you were using did not understand what format the strings were written in. There is also a to_date function that accepts a fmt string to make Spark understand how to parse your strings.

    That would looks something like this:

    import spark.implicits._
    
    // This is just the data you posted in a comment on your question
    val df = spark.read.option("sep", ";").option("header", "true").csv("./dateTimeCSV.csv")
    
    df.show
    +---+----------+---------+---------+-----+                                                                                                                                                                                                                                      
    | Id|      Date|    Fname|    Lname|Route|                                                                                                                                                                                                                                      
    +---+----------+---------+---------+-----+                                                                                                                                                                                                                                      
    |  1|  1/1/2017|     Greg|    Rollo|    S|                                                                                                                                                                                                                                      
    |  2|  1/2/2017|  Frankie|    Allen|    N|                                                                                                                                                                                                                                      
    |  2|  1/3/2017|   Denise|    Smith|    E|                                                                                                                                                                                                                                      
    |  2| 2/11/2017|  Earlene|     Lane|    W|                                                                                                                                                                                                                                      
    |  2| 3/22/2017|  Donovan|   Joseph|   SE|                                                                                                                                                                                                                                      
    |  3| 3/11/2017|  Rebecca|    Bowls|   NE|                                                                                                                                                                                                                                      
    |  3|  4/5/2017|Katherine|     Cook|   NW|                                                                                                                                                                                                                                      
    |  4| 5/16/2017|   Alicia|    Mason|   SW|                                                                                                                                                                                                                                      
    |  4|  6/9/2017|      Bob| Peterson|    S|                                                                                                                                                                                                                                      
    |  4| 6/30/2017|    Janet|     Love|    N|                                                                                                                                                                                                                                      
    |  5|  7/2/2017|  Richard|   Dingle|    E|                                                                                                                                                                                                                                      
    |  5| 8/25/2017|   Thomas|Velasquez|    W|                                                                                                                                                                                                                                      
    |  5| 8/10/2017|    Susan|     King|   SE|                                                                                                                                                                                                                                      
    |  1| 9/25/2017|  Pratesh|   Venkat|   NE|                                                                                                                                                                                                                                      
    |  1|10/14/2017|     Neha|    Kumar|   NW|                                                                                                                                                                                                                                      
    |  1| 11/2/2017|    Louis| Williams|   SW|                                                                                                                                                                                                                                      
    |  3|11/28/2017|  Winston|   McLean|    E|                                                                                                                                                                                                                                      
    |  3| 12/2/2017|    Clark|     Kent|    N|                                                                                                                                                                                                                                      
    |  3|12/14/2017|    Bruce|    Wayne|    S|                                                                                                                                                                                                                                      
    +---+----------+---------+---------+-----+
    
    val output = df.withColumn("Month",date_format(to_date(df("Date"), "MM/dd/yyyy"), "MM"))
    
    output.show
    +---+----------+---------+---------+-----+-----+                                                                                                                                                                                                                                
    | Id|      Date|    Fname|    Lname|Route|Month|                                                                                                                                                                                                                                
    +---+----------+---------+---------+-----+-----+                                                                                                                                                                                                                                
    |  1|  1/1/2017|     Greg|    Rollo|    S|   01|                                                                                                                                                                                                                                
    |  2|  1/2/2017|  Frankie|    Allen|    N|   01|                                                                                                                                                                                                                                
    |  2|  1/3/2017|   Denise|    Smith|    E|   01|                                                                                                                                                                                                                                
    |  2| 2/11/2017|  Earlene|     Lane|    W|   02|                                                                                                                                                                                                                                
    |  2| 3/22/2017|  Donovan|   Joseph|   SE|   03|                                                                                                                                                                                                                                
    |  3| 3/11/2017|  Rebecca|    Bowls|   NE|   03|                                                                                                                                                                                                                                
    |  3|  4/5/2017|Katherine|     Cook|   NW|   04|                                                                                                                                                                                                                                
    |  4| 5/16/2017|   Alicia|    Mason|   SW|   05|                                                                                                                                                                                                                                
    |  4|  6/9/2017|      Bob| Peterson|    S|   06|                                                                                                                                                                                                                                
    |  4| 6/30/2017|    Janet|     Love|    N|   06|                                                                                                                                                                                                                                
    |  5|  7/2/2017|  Richard|   Dingle|    E|   07|                                                                                                                                                                                                                                
    |  5| 8/25/2017|   Thomas|Velasquez|    W|   08|                                                                                                                                                                                                                                
    |  5| 8/10/2017|    Susan|     King|   SE|   08|                                                                                                                                                                                                                                
    |  1| 9/25/2017|  Pratesh|   Venkat|   NE|   09|                                                                                                                                                                                                                                
    |  1|10/14/2017|     Neha|    Kumar|   NW|   10|                                                                                                                                                                                                                                
    |  1| 11/2/2017|    Louis| Williams|   SW|   11|                                                                                                                                                                                                                                
    |  3|11/28/2017|  Winston|   McLean|    E|   11|                                                                                                                                                                                                                                
    |  3| 12/2/2017|    Clark|     Kent|    N|   12|                                                                                                                                                                                                                                
    |  3|12/14/2017|    Bruce|    Wayne|    S|   12|                                                                                                                                                                                                                                
    +---+----------+---------+---------+-----+-----+
    
    

    As you can see, I added a fmt string with as value "MM/dd/yyyy" to make Spark understand how to parse the date.

    Hope this helps!