Search code examples
juliadataframes.jl

Datetimes for Julia dataframes


pandas has a number of very handy utilities for manipulating datetime indices. Is there any similar functionality in Julia? I have not found any tutorials for working with such things, though it obviously must be possible.

Some examples of pandas utilities:

dti = pd.to_datetime(
    ["1/1/2018", np.datetime64("2018-01-01"), 
datetime.datetime(2018, 1, 1)]
)

dti = pd.date_range("2018-01-01", periods=3, freq="H")

dti = dti.tz_localize("UTC")

dti.tz_convert("US/Pacific")

idx = pd.date_range("2018-01-01", periods=5, freq="H")
ts = pd.Series(range(len(idx)), index=idx)
ts.resample("2H").mean()

Solution

  • Julia libraries have "do only one thing but do it right" philosophy so the layout of its libraries matches perhaps more a Unix (battery of small tools that allow to accomplish a common goal) rather then Python's. Hence you have separate libraries for DataFrames and Dates:

    julia> using Dates, DataFrames
    

    Going through some of the examples of your tutorial:

    Pandas

    dti = pd.to_datetime(
        ["1/1/2018", np.datetime64("2018-01-01"), datetime.datetime(2018, 1, 1)]
    )
    

    Julia

    julia> DataFrame(dti=[Date("1/1/2018", "m/d/y"), Date("2018-01-01"), Date(2018,1,1)])
    3×1 DataFrame
     Row │ dti
         │ Date
    ─────┼────────────
       1 │ 2018-01-01
       2 │ 2018-01-01
       3 │ 2018-01-01
    

    Pandas

    dti = pd.date_range("2018-01-01", periods=3, freq="H")
    

    Julia

    julia> DateTime("2018-01-01")  .+ Hour.(0:2)
    3-element Vector{DateTime}:
     2018-01-01T00:00:00
     2018-01-01T01:00:00
     2018-01-01T02:00:00
    

    Pandas

    dti = dti.tz_localize("UTC")
    
    dti.tz_convert("US/Pacific")
    

    Julia

    Note that that there is a separate library in Julia for time zones. Additionally "US/Pacific" is a legacy name of a time zone.

    julia> using TimeZones
    
    julia> dti = ZonedDateTime.(dti, tz"UTC")
    3-element Vector{ZonedDateTime}:
     2018-01-01T00:00:00+00:00
     2018-01-01T01:00:00+00:00
     2018-01-01T02:00:00+00:00
    
    julia> julia> astimezone.(dti, TimeZone("US/Pacific", TimeZones.Class(:LEGACY)))
    3-element Vector{ZonedDateTime}:
     2017-12-31T16:00:00-08:00
     2017-12-31T17:00:00-08:00
     2017-12-31T18:00:00-08:00
    

    Pandas

    idx = pd.date_range("2018-01-01", periods=5, freq="H")
    ts = pd.Series(range(len(idx)), index=idx)
    ts.resample("2H").mean()
    

    Julia

    For resampling or other complex manipulations you will want to use the split-apply-combine pattern (see https://docs.juliahub.com/DataFrames/AR9oZ/1.3.1/man/split_apply_combine/)

    julia> df = DataFrame(date=DateTime("2018-01-01")  .+ Hour.(0:4), vals=1:5)
    5×2 DataFrame
     Row │ date                 vals
         │ DateTime             Int64
    ─────┼────────────────────────────
       1 │ 2018-01-01T00:00:00      1
       2 │ 2018-01-01T01:00:00      2
       3 │ 2018-01-01T02:00:00      3
       4 │ 2018-01-01T03:00:00      4
       5 │ 2018-01-01T04:00:00      5
    julia> df.date2 = floor.(df.date, Hour(2));
    
    julia> using StatsBase
    
    julia> combine(groupby(df, :date2), :date2, :vals => mean => :vals_mean)
    5×2 DataFrame
     Row │ date2                vals_mean
         │ DateTime             Float64
    ─────┼────────────────────────────────
       1 │ 2018-01-01T00:00:00        1.5
       2 │ 2018-01-01T00:00:00        1.5
       3 │ 2018-01-01T02:00:00        3.5
       4 │ 2018-01-01T02:00:00        3.5
       5 │ 2018-01-01T04:00:00        5.0