I have a CSV that appears to be the output of an Excel Pivot Table with names nested as row labels for repeating groups. I would like to clean the data so that the row labels are repeated in a separate column, ideally using dplyr.
The data looks like this:
dd <- data.frame(variables = c("Abington", "Number of Sales","YTD Number of Sales","Median Sale Price","YTD Median Sale Price", "Acton", "Number of Sales","YTD Number of Sales","Median Sale Price","YTD Median Sale Price"), Year1 = c(" ", 16, 50,415000,413500," ",23,60,799900,704000), Year2 = c(" ",8,13,583000,575000," ",9,39,995000,800000))
dd
variables Year1 Year2
Abington
Number of Sales 16 8
YTD Number of Sales 50 13
Median Sale Price 415000 583000
YTD Median Sale Price 413500 575000
Acton
Number of Sales 23 9
YTD Number of Sales 60 39
Median Sale Price 799900 995000
YTD Median Sale Price 704000 800000
And I would like it to look like this:
Town variables Year1 Year2
Abington Number of Sales 16 8
Abington YTD Number of Sales 50 13
Abington Median Sale Price 415000 583000
Abington YTD Median Sale Price 413500 575000
Acton Number of Sales 23 9
Acton YTD Number of Sales 60 39
Acton Median Sale Price 799900 995000
Acton YTD Median Sale Price 704000 800000
We can use tidyverse
(or dplyr
& tidyr
) for this:
library(tidyverse)
dd %>%
mutate(Town = ifelse(Year1 == " " & Year2 == " ", variables, NA)) %>%
fill(Town, .direction = "down") %>%
filter(Town != variables) %>%
relocate(Town)
Resulting in:
Town variables Year1 Year2
1 Abington Number of Sales 16 8
2 Abington YTD Number of Sales 50 13
3 Abington Median Sale Price 415000 583000
4 Abington YTD Median Sale Price 413500 575000
5 Acton Number of Sales 23 9
6 Acton YTD Number of Sales 60 39
7 Acton Median Sale Price 799900 995000
8 Acton YTD Median Sale Price 704000 8e+05
Important to note that the empty values at Year1
and Year2
are actually whitespaces (" ") rather than empty strings or NA.