Search code examples
rtime-seriesbinning

Separate rows of data by 0 values in R


I have a large time series (in data frame form) (n=>6000) that looks like this:

              time, precip

1   2005-09-30 11:45:00,   0.08
2   2005-09-30 23:45:00,   0.72
3   2005-10-01 11:45:00,   0.01
4   2005-10-01 23:45:00,   0.08
5   2005-10-02 11:45:00,   0.10
6   2005-10-02 23:45:00,   0.33
7   2005-10-03 11:45:00,   0.15
8   2005-10-03 23:45:00,   0.30
9   2005-10-04 11:45:00,   0.00
10  2005-10-04 23:45:00,   0.00
11  2005-10-05 11:45:00,   0.02
12  2005-10-05 23:45:00,   0.00
13  2005-10-06 11:45:00,   0.00
14  2005-10-06 23:45:00,   0.01
15  2005-10-07 11:45:00,   0.00
16  2005-10-07 23:45:00,   0.00
17  2005-10-08 11:45:00,   0.00
18  2005-10-08 23:45:00,   0.16
19  2005-10-09 11:45:00,   0.03
20  2005-10-09 23:45:00,   0.00

Each row has a time (YYYY-MM-DD HH:MM:SS, 12 hour timeseries) and a precipitation amount. I'd like to separate the data by storm events.

What I'd like to do is this: 1) adding a new column called "storm" 2) for each set of amount values separated by 0's, call it one storm.

For example...

             Time,        Precip,  Storm

1   2005-09-30 11:45:00,   0.08,  1
2   2005-09-30 23:45:00,   0.72,  1
3   2005-10-01 11:45:00,   0.01,  1
4   2005-10-01 23:45:00,  0.08,  1
5   2005-10-02 11:45:00,   0.10,  1
6   2005-10-02 23:45:00,   0.33,  1
7   2005-10-03 11:45:00,   0.15, 1
8   2005-10-03 23:45:00,   0.30, 1
9   2005-10-04 11:45:00,   0.00
10  2005-10-04 23:45:00,   0.00
11  2005-10-05 11:45:00,   0.02, 2
12  2005-10-05 23:45:00,   0.00
13  2005-10-06 11:45:00,   0.00
14  2005-10-06 23:45:00,   0.01, 3
15  2005-10-07 11:45:00,   0.00
16  2005-10-07 23:45:00,   0.00
17  2005-10-08 11:45:00,   0.00
18  2005-10-08 23:45:00,   0.16, 4
19  2005-10-09 11:45:00,   0.03, 4
20  2005-10-09 23:45:00,   0.00

4) after that, my plan is to subset the data by storm event.

I am pretty new to R, so don't be afraid of pointing out the obvious. Your help would be much appreciated!


Solution

  • You can find the events within a storm then use rle and modify the results

    # assuming your data is called rainfall
    # identify whether a  precipitation has been recorded at each timepoint
    rainfall$storm <- rainfall$precip > 0
    # do run length encoding on this storm indicator
    storms < rle(rainfall$storms)
    # set the FALSE values to NA
    is.na(storms$values) <- !storms$values
    # replace the TRUE values with a number in seqence
    storms$values[which(storms$values)] <- seq_len(sum(storms$values, na.rm = TRUE))
    # use inverse.rle to revert to the full length column
    rainfall$stormNumber <- inverse.rle(storms)