Search code examples
perlunix-timestampiso8601

convert UNIX epoch time to ISO8601 in a stream of data with perl


I'm using an API with curl --write-out '\n%{http_code}\t=%{time_total}s\n' that provides the date information about the fields in UNIX Epoch time instead of ISO8601, which makes it difficult to understand what's going on.

Sample Input:

{"message":"Domains list","list":[{"domain":"example.org","created":"1443042000","regtill":"1632430800"}]}
200 =0.126406s
{"list":[{"d":"abc","c":"1443042000"},{"d":"xyz","c":"1000000000"}]}
200 =0.126406s

Is there a way to find in this stream of data anything that looks like a UNIX Epoch time (e.g., representing the recent times (any 10-digit number in quotes should do (1000000000 is 2001-09-09 as per env TZ=GMT-3 date -r1000000000 +%Y-%m-%dT%H%M%S%z with BSD date(1)))), and convert it all to ISO8601-like dates, with a small shell script snippet in perl or BSD awk without any extensive dependencies to do the transformation?

Desired Output (with or without timezone offsets):

{"message":"Domains list","list":[
    {"domain":"example.org","created":"2015-09-24T000000+0300","regtill":"2021-09-24T000000+0300"}]}
200 =0.126406s
{"list":[
    {"d":"abc","c":"2015-09-24T000000+0300"},
    {"d":"xyz","c":"2001-09-09T044640+0300"}]}
200 =0.126406s

Solution

  • The best snippet so far has been provided by simbabque in a comment:

    $ perl -MTime::Piece -pe 's/(\d{10,})/{localtime($1)->datetime}/ge' <<<'{"message":"Domains list","list":[{"domain":"example.org","created":"1443042000","regtill":"1632430800"}]}' - you can pipe your curl through this. – simbabque 1 hour ago


    I've further adapted it as follows:

    • modified the regular expression to use positive lookbehind and lookahead, for \w":" (word, quote, colon, quote) and "\W{2} (quote, not word twice), with (?<=\w":") and (?="\W{2}), respectively, to match UNIX Epoch time only as a JSON value of some keyword, and avoid any possible false-positives when random other data is streamed into the script;
    • restrict UNIX Epoch time to exactly 10-digits, which should cover date periods between 2001 and 2286:
      • env TZ=GMT perl -MTime::Piece -e 'print localtime(1000000000)->datetime, "Z/", localtime(9999999999)->datetime, "Z\n"'
      • 2001-09-09T01:46:40Z/2286-11-20T17:46:39Z
    • use a few extra regular expressions to insert line breaks for each entry of the domain array;

    The simplest snippet:

    curl ... \ 
      | perl -MTime::Piece -pe's#(\d{10})#localtime($1)->datetime#ge'
    

    Add lookbehind/lookahead for just the opening/closing quote symbols alone:

    curl ... \ 
      | perl -MTime::Piece -pe's#(?<=")(\d{10})(?=")#localtime($1)->datetime#ge'
    

    Specify the timezone of the input data, more restrictive JSON-specific lookbehind/lookahead, and also insert linebreaks for each domain in the list, to make it more user-readable:

    curl ... \ 
      | env TZ=GMT-3 perl -MTime::Piece -p \
        -e 's#(?<=":\[|["\d]},)(?={")#\n\t#g;' \
        -e 's#(?<=\w":")(\d{10})(?="\W{2})#localtime($1)->datetime#ge;'
    

    Sample test run of the final solution for the overall transformation -- works on any stream of data, ignores anything that's not a JSON, would never give any errors on invalid JSON:

    % printf '{"list":[{"d":"abc","c":"1443042000"},{"d":"xyz","c":"1000000000"}]}\n' \
        | env TZ=GMT-3 perl -MTime::Piece -p \
        -e 's#(?<=":\[|["\d]},)(?={")#\n\t#g;' \
        -e 's#(?<=\w":")(\d{10})(?="\W{2})#localtime($1)->datetime#ge;'
    {"list":[
        {"d":"abc","c":"2015-09-24T00:00:00"},
        {"d":"xyz","c":"2001-09-09T04:46:40"}]}
    % 
    

    This solution is more flexible than the other solutions that presume that the input is a valid JSON, because it can also be used in instances where a single curl command is used to make more than a single request, which wouldn't be a valid JSON, since JSON can only have one root element. It also works with curl --write-out (e.g., curl -w '\n%{http_code}\t=%{time_total}s\n'), which, likewise, would not be valid JSON, either. It's also more flexible because it doesn't even assume that the input data is in any specific format -- anything with the 10-digit numbers between \w":" and "\W{2} will be automatically converted from UNIX Epoch time to ISO8601 exactly as specified in the question.