Search code examples
rdplyrreshapetidyrmelt

Expanding data by month


I have the following dataframe with 2 columns: address, startdate, latitude and longitude. It is a listing of the month a given address was cleaned.

df = data.frame(address = c("1 ex St", "2 ex St"), 
               year = (c(2011,2011)),
               month = c("February","April"),
               latitude = c(341.32,343.3),
               longitude =c(432.3, 343.6))

So the data looks like this

  address   year   month    latitude   longitude
  1 ex St   2011   February 341.32     432.3
  2 ex St   2011   April    343.30     343.6

Right now each row represents a particular address and a specific month the address was cleaned. I want to 'expand' the data so that each entry in the address column is broken down into 12 rows for each month 2011. I would also like to add a dummy variable that indicates if the lot has been cleaned before. So the data should like this:

  address   year   month    latitude   longitude cleaned
  1 ex St   2011   January  341.32     432.3     0
  1 ex St   2011   February 341.32     432.3     1
  1 ex St   2011   March    341.32     432.3     1
  1 ex St   2011   April    341.32     432.3     1
  1 ex St   2011   May      341.32     432.3     1
  1 ex St   2011   June     341.32     432.3     1
  1 ex St   2011   July     341.32     432.3     1
  1 ex St   2011   August   341.32     432.3     1
  1 ex St   2011   Septmber 341.32     432.3     1
  1 ex St   2011   October  341.32     432.3     1
  1 ex St   2011   November 341.32     432.3     1
  1 ex St   2011   December 341.32     432.3     1
  2 ex St   2011   January  343.30     343.6     0
  2 ex St   2011   February 343.30     343.6     0
  2 ex St   2011   March    343.30     343.6     0
  2 ex St   2011   April    343.30     343.6     1
  2 ex St   2011   May      343.30     343.6     1
  2 ex St   2011   June     343.30     343.6     1
  2 ex St   2011   July     343.30     343.6     1
  2 ex St   2011   August   343.30     343.6     1
  2 ex St   2011   Septmber 343.30     343.6     1
  2 ex St   2011   October  343.30     343.6     1
  2 ex St   2011   November 343.30     343.6     1
  2 ex St   2011   December 343.30     343.6     1

Are there packages or functions that would allow me to expand my data month-wise in this way? I've looked at melt and reshape packages but they don't seem to work for my case. I'm not necessarily looking for an answer, just some guidance on what tools to use!

EDIT: I used the below answer but the cleaned column is still of. Here is the output.

       month address year latitude longitude cleaned
1    January 1 ex St 2011   341.32     432.3       0
2   February 1 ex St 2011   341.32     432.3       1
3      March 1 ex St 2011   341.32     432.3       0
4      April 1 ex St 2011   341.32     432.3       1
5        May 1 ex St 2011   341.32     432.3       0
6       June 1 ex St 2011   341.32     432.3       0
7       July 1 ex St 2011   341.32     432.3       0
8     August 1 ex St 2011   341.32     432.3       0
9  September 1 ex St 2011   341.32     432.3       1
10   October 1 ex St 2011   341.32     432.3       1
11  November 1 ex St 2011   341.32     432.3       0
12  December 1 ex St 2011   341.32     432.3       1
13   January 2 ex St 2011    343.3     343.6       1
14  February 2 ex St 2011    343.3     343.6       1
15     March 2 ex St 2011    343.3     343.6       0
16     April 2 ex St 2011    343.3     343.6       0
17       May 2 ex St 2011    343.3     343.6       1
18      June 2 ex St 2011    343.3     343.6       0
19      July 2 ex St 2011    343.3     343.6       1
20    August 2 ex St 2011    343.3     343.6       0
21 September 2 ex St 2011    343.3     343.6       0
22   October 2 ex St 2011    343.3     343.6       1
23  November 2 ex St 2011    343.3     343.6       1
24  December 2 ex St 2011    343.3     343.6       0

I suspect that the na.locf() function isn't working because the cleaned column is sampled from 0 to 1 and doesn't have any NA's in it to change. So the cleaned column right now is just a random sample of 0s and 1s. Is there another function/strategy I can use to get the 1's and 0's to correspond to before and after an address was cleaned?


Solution

  • Split by address, merge with all months, create dummy cleaned column. Then fill in NAs with existing values. And lastly order by address and month names:

    library(zoo) # na.locf to fill NAs
    
    do.call(rbind,
            lapply(split(df, df$address), function(i) {
              d <- merge(i, data.frame(month = month.name), all.y = TRUE)
              # convert to factor, then order by month, so it Jan, Feb, Mar, etc
              d$month <- factor(d$month, levels = month.name)
              d <- d[ order(d$month), ]
              # NA fill down
              d <- na.locf(d)
              # Make cleaned column 
              d$clened <- ifelse(is.na(d$address), 0, 1)
              # NA fill up
              d <- na.locf(d, fromLast = TRUE)
            }))
    
    #                month address year latitude longitude clened
    # 1 ex St.5    January 1 ex St 2011   341.32     432.3      0
    # 1 ex St.2   February 1 ex St 2011   341.32     432.3      1
    # 1 ex St.8      March 1 ex St 2011   341.32     432.3      1
    # 1 ex St.1      April 1 ex St 2011   341.32     432.3      1
    # 1 ex St.9        May 1 ex St 2011   341.32     432.3      1
    # 1 ex St.7       June 1 ex St 2011   341.32     432.3      1
    # 1 ex St.6       July 1 ex St 2011   341.32     432.3      1
    # 1 ex St.3     August 1 ex St 2011   341.32     432.3      1
    # 1 ex St.12 September 1 ex St 2011   341.32     432.3      1
    # 1 ex St.11   October 1 ex St 2011   341.32     432.3      1
    # 1 ex St.10  November 1 ex St 2011   341.32     432.3      1
    # 1 ex St.4   December 1 ex St 2011   341.32     432.3      1
    # 2 ex St.5    January 2 ex St 2011    343.3     343.6      0
    # 2 ex St.2   February 2 ex St 2011    343.3     343.6      0
    # 2 ex St.8      March 2 ex St 2011    343.3     343.6      0
    # 2 ex St.1      April 2 ex St 2011    343.3     343.6      1
    # 2 ex St.9        May 2 ex St 2011    343.3     343.6      1
    # 2 ex St.7       June 2 ex St 2011    343.3     343.6      1
    # 2 ex St.6       July 2 ex St 2011    343.3     343.6      1
    # 2 ex St.3     August 2 ex St 2011    343.3     343.6      1
    # 2 ex St.12 September 2 ex St 2011    343.3     343.6      1
    # 2 ex St.11   October 2 ex St 2011    343.3     343.6      1
    # 2 ex St.10  November 2 ex St 2011    343.3     343.6      1
    # 2 ex St.4   December 2 ex St 2011    343.3     343.6      1