Search code examples
scalaapache-sparkdataframehbase

Validating a date column in a dataframe in scala?


I'm reading data from hbase using spark and I have date column in the dataframe and few of the data fields has been corrupted.something like 10-20176-7 etc. How can I check those and replace it with some default values before I process further.

Thanks.


Solution

  • I stack traced the error and below is the error.

    Exception in thread "main" java.time.format.DateTimeParseException: 
    Text '20140218' could not be parsed: 
    Unable to obtain LocalDateTime from TemporalAccessor: 
    {},ISO resolved to 2014-02-18 of type java.time.format.Parsed
    at java.time.format.DateTimeFormatter.createError(DateTimeFormatter.java:1918)
    at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1853)
    at java.time.LocalDateTime.parse(LocalDateTime.java:492)
    

    So I used LocalDate instead of LocalDateTime to resolve the issue. Below is the sample code Used.

    def validateDfsdate(row: Row): Boolean = try {
    
    val a = java.time.LocalDate.parse(row.getString(40), java.time.format.DateTimeFormatter.ofPattern(DATE_TIME_FORMAT))
    
    true
    
    } catch {
    case ex: java.time.format.DateTimeParseException => {
      println("Exception : " + ex)
      false
    }
    

    }