I have columns with data type DATE in sparkSQL
e.g.
CREATE TABLE ABC(startDate DATE, EndDate DATE....
and I load data as LOAD DATA INPATH './input/user.txt' INTO TABLE ABC
In user.txt data is like
2016/06/12 2016/06/15
2016/06/12 2016/06/15
but it loads data as
null null
null null
if it's
2016-06-12 2016-06-15
2016-06-12 2016-06-15
then it takes the data correctly.
How to handle data when the date separator is '/ '?
I don't want to replace the separator in input file.
Please help me. Thanks.
I faced this issue before in Hive. I found a workaround for this. First load them as string instead of Data type DATE
ex:
CREATE TABLE ABC(startDate string, EndDate string....)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ']'
STORED AS TEXTFILE
LOCATION './input/user.txt';
Then i used string functions to extract date/month/year from above fields. For example
select substr(date,1,4) as year,substr(date,6,1) as month .... from ABC
one other way is to replace the '/' with '-' and then cast them as DATE type and use Date functions
example
select regexp_replace(startDate,'/','-') from ABC
All the above is how to achieve it in Hive. To work on this in spark is also to first load them as string in to a dataframe.
val s1 = Seq(("2016/06/12", "2016/06/15" ), ("2016/06/12", "2016/06/15")).toDF("x", "y")
val result = s1.select(regexp_replace($"x","/", "-"),regexp_replace($"y","/", "-")).show()
result
+----------+----------+
| startDate| EndDate|
+----------+----------+
|2016-06-12|2016-06-15|
|2016-06-12|2016-06-15|
+----------+----------+
Hope this helps.