I have a data.frame that has a date index which is replicated in the first column and has a number of different columns and corresponding rows containing data. Because the rows containing data (which are indexed to the date) can vary depending on if corresponding data had been collected sometimes there can be a row (date) with many blanks and only values for several columns. I'd like to collapse these rows into an month and year.
I.e. A row can show data for yesterday but not today for all columns because it may not have been collected yet. Just looks aesthetically messy and would rather just say "June-2020" and collapse them to remove the NA's.
Here is the data.frame's dput:
structure(list(Date = structure(c(18292, 18320, 18321, 18351,
18352, 18382, 18413, 18427, 18428), tzone = "UTC", tclass = "Date", class = "Date"),
`M-o-M Change in Median Rent - AMH GA` = c(0, NA, 0, NA,
0, 0, 0, NA, 0), `Median Advertised Rent - AMH GA` = c(1695,
NA, 1695, NA, 1695, 1695, 1695, NA, 1695), `Median of Daily Rental Listings Available - AMH GA` = c(438,
NA, 430.5, NA, 450, 458, 385, NA, 331), `M-o-M Change in Median Rent - AMH Charlotte` = c(0,
NA, 0, NA, 0, 0.0272727272727273, 0, NA, 0.0324483775811208
), `Median Advertised Rent - AMH Charlotte` = c(1650, NA,
1650, NA, 1650, 1695, 1695, NA, 1750), `Median of Daily Rental Listings Available - AMH Charlotte` = c(244,
NA, 257, NA, 256, 270, 227, NA, 220), `M-o-M Change in Median Rent - AMH Dallas` = c(0,
NA, 0, NA, 0, 0, 0.0306406685236769, NA, 0), `Median Advertised Rent - AMH Dallas` = c(1795,
NA, 1795, NA, 1795, 1795, 1850, NA, 1850), `Median of Daily Rental Listings Available - AMH Dallas` = c(148,
NA, 150, NA, 166, 152.5, 131, NA, 135), `M-o-M Change in Median Rent - AMH Houston` = c(0,
NA, 0.0272727272727273, NA, 0, 0, 0, NA, 0), `Median Advertised Rent - AMH Houston` = c(1650,
NA, 1695, NA, 1695, 1695, 1695, NA, 1695), `Median of Daily Rental Listings Available - AMH Houston` = c(222,
NA, 223, NA, 228, 237.5, 203, NA, 189), `M-o-M Change in Median Rent - AMH Jacksonville` = c(0,
0.00681818181818183, NA, NA, -0.00677200902934538, 0, 0.0272727272727273,
NA, 0), `Median Advertised Rent - AMH Jacksonville` = c(1650,
1661.25, NA, NA, 1650, 1650, 1695, NA, 1695), `Median of Daily Rental Listings Available - AMH Jacksonville` = c(164,
179.5, NA, NA, 188, 195.5, 185, NA, 174), `M-o-M Change in Median Rent - AMH NC` = c(0,
NA, 0.0344827586206897, NA, 0, 0, 0.0272727272727273, NA,
0), `Median Advertised Rent - AMH NC` = c(1595, NA, 1650,
NA, 1650, 1650, 1695, NA, 1695), `Median of Daily Rental Listings Available - AMH NC` = c(365,
NA, 387, NA, 405, 447, 344, NA, 323), `M-o-M Change in Median Rent - AMH NV` = c(0,
NA, 0, 0, NA, -0.0265486725663717, 0, NA, 0.0272727272727273
), `Median Advertised Rent - AMH NV` = c(1695, NA, 1695,
1695, NA, 1650, 1650, NA, 1695), `Median of Daily Rental Listings Available - AMH NV` = c(62,
NA, 59, 70, NA, 71, 63, NA, 58), `M-o-M Change in Median Rent - AMH Orlando` = c(0,
0.0112676056338028, NA, NA, 0, 0, 0, NA, 0.0306406685236769
), `Median Advertised Rent - AMH Orlando` = c(1775, 1795,
NA, NA, 1795, 1795, 1795, NA, 1850), `Median of Daily Rental Listings Available - AMH Orlando` = c(82,
91.5, NA, NA, 106, 119, 117, NA, 105), `M-o-M Change in Median Rent - AMH Phoenix` = c(0,
NA, 0.0216172938350681, NA, 0.0258620689655173, -0.0252100840336135,
0.0344827586206897, 0, NA), `Median Advertised Rent - AMH Phoenix` = c(1561.25,
NA, 1595, NA, 1636.25, 1595, 1650, 1650, NA), `Median of Daily Rental Listings Available - AMH Phoenix` = c(130,
NA, 127, NA, 129, 131, 97, 85, NA), `M-o-M Change in Median Rent - AMH Raleigh` = c(0,
NA, 0.0290322580645161, NA, 0, 0, 0, NA, 0.0344827586206897
), `Median Advertised Rent - AMH Raleigh` = c(1550, NA, 1595,
NA, 1595, 1595, 1595, NA, 1650), `Median of Daily Rental Listings Available - AMH Raleigh` = c(91,
NA, 104, NA, 114, 142, 90, NA, 81), `M-o-M Change in Median Rent - AMH SoFla` = c(0,
-0.00869565217391299, NA, NA, 0.0233918128654971, -0.0314285714285715,
0.0162241887905605, NA, 0.0159651669085632), `Median Advertised Rent - AMH SoFla` = c(1725,
1710, NA, NA, 1750, 1695, 1722.5, NA, 1750), `Median of Daily Rental Listings Available - AMH SoFla` = c(11,
14, NA, NA, 11, 10, 7, NA, 7), `M-o-M Change in Median Rent - AMH Winston-Salem/Greensboro` = c(0,
NA, 0.0290322580645161, NA, 0, 0, 0, NA, 0), `Median Advertised Rent - AMH Winston-Salem/Greensboro` = c(1550,
NA, 1595, NA, 1595, 1595, 1595, NA, 1595), `Median of Daily Rental Listings Available - AMH Winston-Salem/Greensboro` = c(66,
NA, 68, NA, 72, 73, 71, NA, 63)), class = "data.frame", row.names = c("2020-01-31",
"2020-02-28", "2020-02-29", "2020-03-30", "2020-03-31", "2020-04-30",
"2020-05-31", "2020-06-14", "2020-06-15"))
Here are examples of two approaches I've tried:
test <- AMH_final_Monthly3 %>% mutate(month= month(Date), year=year(Date))
test2 <- AMH_final_Monthly3 %>%
collapse_by("monthly") %>%
dplyr::group_by(Date, add = TRUE)
test3 <- as.yearmon(AMH_final_Monthly3)
Help is appreciated!
Something like below? My settings are Dutch hence the weird month names. But I use the yearmon
function from tsibble, get rid of the Date column group by the yearmonth variable and summarise all variables using sum
. Now you could also use first
, last
, or any other function you want.
AMH_final_Monthly3 %>%
mutate(yearmonth = tsibble::yearmonth(Date)) %>%
select(-Date) %>%
group_by(yearmonth) %>%
summarise_all(.funs = "sum", na.rm = TRUE)
# A tibble: 6 x 37
yearmonth `M-o-M Change i~ `Median Adverti~ `Median of Dail~ `M-o-M Change i~ `Median Adverti~ `Median of Dail~ `M-o-M Change i~
<mth> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2020 jan 0 1695 438 0 1650 244 0
2 2020 feb 0 1695 430. 0 1650 257 0
3 2020 mrt 0 1695 450 0 1650 256 0
4 2020 apr 0 1695 458 0.0273 1695 270 0
5 2020 mei 0 1695 385 0 1695 227 0.0306
6 2020 jun 0 1695 331 0.0324 1750 220 0
# ... with 29 more variables: