sql apache-spark hive apache-spark-sql hiveql

Load Date Column from raw data with '/' separator in sparkSQL

I have columns with data type DATE in sparkSQL e.g.
CREATE TABLE ABC(startDate DATE, EndDate DATE....
and I load data as
LOAD DATA INPATH './input/user.txt' INTO TABLE ABC

In user.txt data is like

2016/06/12 2016/06/15 
2016/06/12 2016/06/15

but it loads data as

null null
null null

if it's

2016-06-12 2016-06-15 
2016-06-12 2016-06-15

then it takes the data correctly.
How to handle data when the date separator is '/ '?
I don't want to replace the separator in input file.
Please help me. Thanks.

Solution

I faced this issue before in Hive. I found a workaround for this. First load them as string instead of Data type DATE

ex:

CREATE TABLE ABC(startDate string, EndDate string....)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ']'
STORED AS TEXTFILE
LOCATION './input/user.txt';

Then i used string functions to extract date/month/year from above fields. For example

select substr(date,1,4) as year,substr(date,6,1) as month .... from ABC

one other way is to replace the '/' with '-' and then cast them as DATE type and use Date functions

example

select regexp_replace(startDate,'/','-') from ABC

All the above is how to achieve it in Hive. To work on this in spark is also to first load them as string in to a dataframe.

val s1 = Seq(("2016/06/12", "2016/06/15" ), ("2016/06/12", "2016/06/15")).toDF("x", "y")
val result = s1.select(regexp_replace($"x","/", "-"),regexp_replace($"y","/", "-")).show()

result 
+----------+----------+
| startDate|   EndDate|
+----------+----------+
|2016-06-12|2016-06-15|
|2016-06-12|2016-06-15|
+----------+----------+

Hope this helps.