Search code examples
matlabtimeseries

MATLAB create monthly timeseries from annual data


Hello I am trying to explore some annual data and it would be convenient to explore them every month. In order to separate the data I used this code for January:

d1 = '2021-01-01 00:00:00';
d2 = '2021-01-31 23:59:00';
t1 = datetime(d1,'InputFormat','yyyy-MM-dd HH:mm:ss');
t2 = datetime(d2,'InputFormat','yyyy-MM-dd HH:mm:ss');
idx_time = (date_time >= t1) & (date_time <= t2);

Is there an easier way to do this?


Solution

  • You could simply use the month method to extract the month component from date_time, like this:

    idx_time = month(date_time) == 1;
    

    To create separate arrays for each month of data, you can use findgroups and splitapply, like this.

    [g, id] = findgroups(month(date_time));
    dataByMonth = splitapply(@(x) {x}, var, g)
    

    This results in dataByMonth being a 12x1 cell array where each element is a single month of data. id tells you which month.

    EDIT following discussions in the chat, it turns out that the following approach was what was needed.

    l = load('data.mat');
    % Create a timetable
    tt = timetable(l.date_time, l.var);
    % Aggregate per day
    amountPerDay = retime(tt, 'daily', 'sum')
    % Select days with non-zero amount
    anyPerDay = timetable(rainPerDay.Time, double(amountPerDay.Var1 > 0))
    % Compute the number of days per month with non-zero amount
    retime(anyPerDay, 'monthly', 'sum')
    

    (Note the use of double(amountPerDay.Var1>0) is to work around a limitation in older versions of MATLAB that do not permit retime to aggregate logical data)

    EDIT 2: To get the Time variable of the resulting timetable to display as a long month name, you can simply set the Format property of that variable:

    rainyDaysPerMonth = retime(rainyDays, 'monthly', 'sum')
    rainyDaysPerMonth.Time.Format = 'MMMM'
    

    EDIT 3: To get the rainiest day per month, this needs splitapply and a small helper function. Like this

    g = findgroups(month(amountPerDay.Time));
    % Use splitapply to find the day with the maximum amount. Also
    % need to return the day on which that occurred, so we need a small
    % helper function
    rainiestDayPerMonth = splitapply(@iMaxAndLoc, amountPerDay.Time, ...
        amountPerDay.Var1, g);
    
    % Given vectors of time and value, return a single-row table
    % with the time at which the max value occurs, and that max value
    function out = iMaxAndLoc(t, v)
    [maxV, idx] = max(v);
    out = table(t(idx), maxV, 'VariableNames', {'Time', 'Value'});
    end