Search code examples
rdplyrdata-cleaning

Nested Row Labels to Column


I have a CSV that appears to be the output of an Excel Pivot Table with names nested as row labels for repeating groups. I would like to clean the data so that the row labels are repeated in a separate column, ideally using dplyr.

The data looks like this:

dd <- data.frame(variables = c("Abington", "Number of Sales","YTD Number of Sales","Median Sale Price","YTD Median Sale Price", "Acton", "Number of Sales","YTD Number of Sales","Median Sale Price","YTD Median Sale Price"), Year1 = c(" ", 16, 50,415000,413500," ",23,60,799900,704000), Year2 = c(" ",8,13,583000,575000," ",9,39,995000,800000))

dd

variables              Year1   Year2
Abington              
Number of Sales        16      8
YTD Number of Sales    50      13
Median Sale Price      415000  583000
YTD Median Sale Price  413500  575000
Acton              
Number of Sales        23      9
YTD Number of Sales    60      39
Median Sale Price      799900  995000
YTD Median Sale Price  704000  800000

And I would like it to look like this:

Town          variables               Year1  Year2           
Abington      Number of Sales         16     8
Abington      YTD Number of Sales     50     13
Abington      Median Sale Price       415000 583000
Abington      YTD Median Sale Price   413500 575000          
Acton         Number of Sales         23      9
Acton         YTD Number of Sales     60     39
Acton         Median Sale Price       799900 995000
Acton         YTD Median Sale Price   704000 800000

Solution

  • We can use tidyverse (or dplyr & tidyr) for this:

    library(tidyverse)
    
    dd %>%
      mutate(Town = ifelse(Year1 == " " & Year2 == " ", variables, NA)) %>%
      fill(Town, .direction = "down") %>%
      filter(Town != variables) %>%
      relocate(Town)
    

    Resulting in:

          Town             variables  Year1  Year2
    1 Abington       Number of Sales     16      8
    2 Abington   YTD Number of Sales     50     13
    3 Abington     Median Sale Price 415000 583000
    4 Abington YTD Median Sale Price 413500 575000
    5    Acton       Number of Sales     23      9
    6    Acton   YTD Number of Sales     60     39
    7    Acton     Median Sale Price 799900 995000
    8    Acton YTD Median Sale Price 704000  8e+05
    

    Important to note that the empty values at Year1 and Year2 are actually whitespaces (" ") rather than empty strings or NA.