Search code examples
rexceliterationpurrr

Conditionally concatenate strings in vector based on a value in secondary vector in R


I am working with reports in the form of Excel spreadsheets, and they have headings with varying indent levels (example results from tidyxl below). The indent levels can also be inconsistent between heading groups as in the example of Heading 1 versus Heading 2.

contents <- c('Heading 1', 'Subheading 1', 'Item 1', 'Item 2', 'Subheading 2', 'Item 1', 'Item 2', 'Heading 2','Subheading 1','Item 1','Item 2')
indents <- c(0,1,2,2,1,2,2,0,2,4,4)
df <- data.frame(indents,contents)
   indents     contents
1        0    Heading 1
2        1 Subheading 1
3        2       Item 1
4        2       Item 2
5        1 Subheading 2
6        2       Item 1
7        2       Item 2
8        0    Heading 2
9        2 Subheading 1
10       4       Item 1
11       4       Item 2

I would like to produce a vector of headings where headings are concatenated together, like is shown below under 'headings.'

headings <- c('Heading 1','Heading 1 | Subheading 1','Heading 1 | Subheading 1 | Item 1','Heading 1 | Subheading 1 | Item 2','Heading 1 | Subheading 2','Heading 1 | Subheading 2 | Item 1','Heading 1 | Subheading 2 | Item 2','Heading 2','Heading 2 | Subheading 1','Heading 2 | Subheading 1 | Item 1','Heading 2 | Subheading 1 | Item 2')
data.frame(indents,contents, headings)
   indents     contents                          headings
1        0    Heading 1                         Heading 1
2        1 Subheading 1          Heading 1 | Subheading 1
3        2       Item 1 Heading 1 | Subheading 1 | Item 1
4        2       Item 2 Heading 1 | Subheading 1 | Item 2
5        1 Subheading 2          Heading 1 | Subheading 2
6        2       Item 1 Heading 1 | Subheading 2 | Item 1
7        2       Item 2 Heading 1 | Subheading 2 | Item 2
8        0    Heading 2                         Heading 2
9        2 Subheading 1          Heading 2 | Subheading 1
10       4       Item 1 Heading 2 | Subheading 1 | Item 1
11       4       Item 2 Heading 2 | Subheading 1 | Item 2

I have searched for a solution but haven't found anything that does what I need. I'm imagining a paste or paste0 call within a loop or apply function may do it but I haven't had any luck thus far.


Solution

  • This doesn't feel very efficient. I'm trying to imagine a good nested list object, or maybe something else. Perhaps a better answer will come along.

    That being said, something like this might work, though I'd probably test some edge cases. It just goes down the rows and stores previous headings that will need to be used. The indents 0 to 4 are stored in index positions 1 to 5 in the stored_headings vector.

    contents <- c(
      "Heading 1", "Subheading 1", "Item 1", "Item 2", "Subheading 2", "Item 1",
      "Item 2", "Heading 2", "Subheading 1", "Item 1", "Item 2"
    )
    indents <- c(0, 1, 2, 2, 1, 2, 2, 0, 2, 4, 4)
    df <- data.frame(indents, contents)
    
    # Store up to 5 levels of headings in memory (in this case for 0 through 4)
    # Use extra level at end to avoid error in assignment below,
    # so we use a vector of length 6
    stored_headings <- rep(NA, 6) |> as.character()
    
    # Logic according to indent level, using +1 on everything to
    # agree with R indexing starting at 1
    for (i in seq_len(nrow(df))) {
    
      # Store heading at appropriate level in stored_headings
      stored_headings[df$indents[i] + 1] <- df$contents[i]
    
      # Remove all headings below (greater indent) than current
      stored_headings[(df$indents[i] + 1 + 1):length(stored_headings)] <- NA
    
      # Concatenate headings, removing NA headings between or below
      df$headings[i] <- stored_headings[!is.na(stored_headings)] |>
        paste(collapse = " | ")
    
    }
    
    df
    
       indents     contents                          headings
    1        0    Heading 1                         Heading 1
    2        1 Subheading 1          Heading 1 | Subheading 1
    3        2       Item 1 Heading 1 | Subheading 1 | Item 1
    4        2       Item 2 Heading 1 | Subheading 1 | Item 2
    5        1 Subheading 2          Heading 1 | Subheading 2
    6        2       Item 1 Heading 1 | Subheading 2 | Item 1
    7        2       Item 2 Heading 1 | Subheading 2 | Item 2
    8        0    Heading 2                         Heading 2
    9        2 Subheading 1          Heading 2 | Subheading 1
    10       4       Item 1 Heading 2 | Subheading 1 | Item 1
    11       4       Item 2 Heading 2 | Subheading 1 | Item 2