I have the following dataframe with 2 columns: address, startdate, latitude and longitude. It is a listing of the month a given address was cleaned.
df = data.frame(address = c("1 ex St", "2 ex St"),
year = (c(2011,2011)),
month = c("February","April"),
latitude = c(341.32,343.3),
longitude =c(432.3, 343.6))
So the data looks like this
address year month latitude longitude
1 ex St 2011 February 341.32 432.3
2 ex St 2011 April 343.30 343.6
Right now each row represents a particular address and a specific month the address was cleaned. I want to 'expand' the data so that each entry in the address column is broken down into 12 rows for each month 2011. I would also like to add a dummy variable that indicates if the lot has been cleaned before. So the data should like this:
address year month latitude longitude cleaned
1 ex St 2011 January 341.32 432.3 0
1 ex St 2011 February 341.32 432.3 1
1 ex St 2011 March 341.32 432.3 1
1 ex St 2011 April 341.32 432.3 1
1 ex St 2011 May 341.32 432.3 1
1 ex St 2011 June 341.32 432.3 1
1 ex St 2011 July 341.32 432.3 1
1 ex St 2011 August 341.32 432.3 1
1 ex St 2011 Septmber 341.32 432.3 1
1 ex St 2011 October 341.32 432.3 1
1 ex St 2011 November 341.32 432.3 1
1 ex St 2011 December 341.32 432.3 1
2 ex St 2011 January 343.30 343.6 0
2 ex St 2011 February 343.30 343.6 0
2 ex St 2011 March 343.30 343.6 0
2 ex St 2011 April 343.30 343.6 1
2 ex St 2011 May 343.30 343.6 1
2 ex St 2011 June 343.30 343.6 1
2 ex St 2011 July 343.30 343.6 1
2 ex St 2011 August 343.30 343.6 1
2 ex St 2011 Septmber 343.30 343.6 1
2 ex St 2011 October 343.30 343.6 1
2 ex St 2011 November 343.30 343.6 1
2 ex St 2011 December 343.30 343.6 1
Are there packages or functions that would allow me to expand my data month-wise in this way? I've looked at melt and reshape packages but they don't seem to work for my case. I'm not necessarily looking for an answer, just some guidance on what tools to use!
EDIT: I used the below answer but the cleaned column is still of. Here is the output.
month address year latitude longitude cleaned
1 January 1 ex St 2011 341.32 432.3 0
2 February 1 ex St 2011 341.32 432.3 1
3 March 1 ex St 2011 341.32 432.3 0
4 April 1 ex St 2011 341.32 432.3 1
5 May 1 ex St 2011 341.32 432.3 0
6 June 1 ex St 2011 341.32 432.3 0
7 July 1 ex St 2011 341.32 432.3 0
8 August 1 ex St 2011 341.32 432.3 0
9 September 1 ex St 2011 341.32 432.3 1
10 October 1 ex St 2011 341.32 432.3 1
11 November 1 ex St 2011 341.32 432.3 0
12 December 1 ex St 2011 341.32 432.3 1
13 January 2 ex St 2011 343.3 343.6 1
14 February 2 ex St 2011 343.3 343.6 1
15 March 2 ex St 2011 343.3 343.6 0
16 April 2 ex St 2011 343.3 343.6 0
17 May 2 ex St 2011 343.3 343.6 1
18 June 2 ex St 2011 343.3 343.6 0
19 July 2 ex St 2011 343.3 343.6 1
20 August 2 ex St 2011 343.3 343.6 0
21 September 2 ex St 2011 343.3 343.6 0
22 October 2 ex St 2011 343.3 343.6 1
23 November 2 ex St 2011 343.3 343.6 1
24 December 2 ex St 2011 343.3 343.6 0
I suspect that the na.locf() function isn't working because the cleaned column is sampled from 0 to 1 and doesn't have any NA's in it to change. So the cleaned column right now is just a random sample of 0s and 1s. Is there another function/strategy I can use to get the 1's and 0's to correspond to before and after an address was cleaned?
Split by address, merge with all months, create dummy cleaned column. Then fill in NAs with existing values. And lastly order by address and month names:
library(zoo) # na.locf to fill NAs
do.call(rbind,
lapply(split(df, df$address), function(i) {
d <- merge(i, data.frame(month = month.name), all.y = TRUE)
# convert to factor, then order by month, so it Jan, Feb, Mar, etc
d$month <- factor(d$month, levels = month.name)
d <- d[ order(d$month), ]
# NA fill down
d <- na.locf(d)
# Make cleaned column
d$clened <- ifelse(is.na(d$address), 0, 1)
# NA fill up
d <- na.locf(d, fromLast = TRUE)
}))
# month address year latitude longitude clened
# 1 ex St.5 January 1 ex St 2011 341.32 432.3 0
# 1 ex St.2 February 1 ex St 2011 341.32 432.3 1
# 1 ex St.8 March 1 ex St 2011 341.32 432.3 1
# 1 ex St.1 April 1 ex St 2011 341.32 432.3 1
# 1 ex St.9 May 1 ex St 2011 341.32 432.3 1
# 1 ex St.7 June 1 ex St 2011 341.32 432.3 1
# 1 ex St.6 July 1 ex St 2011 341.32 432.3 1
# 1 ex St.3 August 1 ex St 2011 341.32 432.3 1
# 1 ex St.12 September 1 ex St 2011 341.32 432.3 1
# 1 ex St.11 October 1 ex St 2011 341.32 432.3 1
# 1 ex St.10 November 1 ex St 2011 341.32 432.3 1
# 1 ex St.4 December 1 ex St 2011 341.32 432.3 1
# 2 ex St.5 January 2 ex St 2011 343.3 343.6 0
# 2 ex St.2 February 2 ex St 2011 343.3 343.6 0
# 2 ex St.8 March 2 ex St 2011 343.3 343.6 0
# 2 ex St.1 April 2 ex St 2011 343.3 343.6 1
# 2 ex St.9 May 2 ex St 2011 343.3 343.6 1
# 2 ex St.7 June 2 ex St 2011 343.3 343.6 1
# 2 ex St.6 July 2 ex St 2011 343.3 343.6 1
# 2 ex St.3 August 2 ex St 2011 343.3 343.6 1
# 2 ex St.12 September 2 ex St 2011 343.3 343.6 1
# 2 ex St.11 October 2 ex St 2011 343.3 343.6 1
# 2 ex St.10 November 2 ex St 2011 343.3 343.6 1
# 2 ex St.4 December 2 ex St 2011 343.3 343.6 1