needing some advice on splitting a number into a date timestamp, currently using Hue to query the hive db;
In a table I have a column that is used to capture a unique ref for a record. The value looks like this;
219872021081000741
Contained within this is a date and time, I'm looking to extract (using sql) the date/time from this and have it as a column of its own. Here is the breakdown of the number:
Based on the bold values from left to right is DD YYYY MM HHMM
21 987 2021 08 1000 741
regex [0-3]?[0-9]{1}$ref[2][0-9][0-9][0-9][0-1][0-9][0-2][0-9][0-5][0-9][0-9]{3}_"
Using sql, I want to assess the number then create a column that then formats it to DD-MM-YY HHMM as timestamp. Have reviewed some posts, and trying out a few things, but not having much luck. The other sticking point is the DD will not always be 2 values eg, if it was the 1st then it will be 1 not 01.
Trying to incorporate into the below. Thanks in advance for any advice.
select *,
cast((UTC +(60*60*12)*1000)/1000 as TIMESTAMP) as `LocalTime`
from Table.Name
where
name rlike 'FieldValue.*'
UPDATE: In a roundabout way I updated the sql to do a count of the value.
Here is the update sql, just need to figure out/get some guidance on how I now determine how to use the new column FieldCount eg it is 17, then substring(FieldValue ,1,1) given its anything from the 1st-9th. If its 18, then substring(FieldValue ,1,2) given its anything from the 10th up.
select *,
cast((utc+(60*60*12)*1000)/1000 as TIMESTAMP) as `LocalTime`,
case
when FieldValue REGEXP '^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$' then '17'
when FieldValue REGEXP '^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$' then '17'
end FieldCount,
substring(FieldValue ,6,4) as Years,
substring(FieldValue ,1,1) as Days,
substring(FieldValue ,10,2) as Months,
substring(FieldValue ,12,2) as Hours,
substring(FieldValue ,14,2) as Minut
from table.name
New Update, I changed this now to separate based on case condition. This basically separates out the value into separate fields. Any ideas to concat based on alias field names?
select
AField,
cast((UTC+(60*60*12)*1000)/1000 as TIMESTAMP) as `LocalTime`,
case when length(AField) = 18 then substring(AField,1,2) else substring(AField,1,1) end Days,
case when length(AField) = 18 then substring(AField,10,2) else substring(AField,9,2) end Months,
case when length(AField) = 18 then substring(AField,6,4) else substring(AField,5,4) end years,
case when length(AField) = 18 then substring(AField,12,2) else substring(caseid,11,2) end Hours,
case when length(AField) = 18 then substring(AField,14,2) else substring(AField,13,2) end minutes
from table.name
Correct timestamp string representation in Hive is yyyy-MM-dd HH:mm:ss.S
.
You do not need to extract all parts separately, then concat to get timestamps. Using regexp_replace you can build correct timestamp using backreferences to capturing groups (in round brackets) in the regexp.
with mytable as(--test dataset, use your table instead
select stack(2,
'219872021081000741',
'19872021081000741'
) as AField
)
select
case when length(AField) = 18
then timestamp(regexp_replace(AField,'^(\\d{2})\\d{3}(\\d{4})(\\d{2})(\\d{2})(\\d{2})\\d{3}$','$2-$3-$1 $4:$5:00.0'))
else timestamp(regexp_replace(AField,'^(\\d)\\d{3}(\\d{4})(\\d{2})(\\d{2})(\\d{2})\\d{3}$','$2-$3-0$1 $4:$5:00.0'))
end as result
from mytable
Result:
result
2021-08-21 10:00:00.0
2021-08-01 10:00:00.0
Note: timestamp()
construct here is to demonstrate that string produced is compatible with timestamp data type and is being cast correctly, you can keep it as string if you prefer.