Search code examples
nlptidyversestringrquanteda

Identifying, counting, AND labelling spaces in a column?


I have a dataframe of 1 column in R. In it is a bunch of names, e.g. Claire Randall Fraser. I know how to make a looping function that will apply a second function to each and every cell. But I'm stuck on how to create that second function, which will be to identify and LABEL each space (" ") in each cell. E.g. Claire[1]Randall[2]Fraser.

Is there a way to do this? Thanks in advance, and please explain like I'm a beginner in R.


Solution

  • Here's an initial solution using a mixed bag of methods:

    Data:

    str <- c("Claire Randall Fraser", "Peter Dough", "Orson Dude Welles Man")
    

    Solution:

    library(data.table)
    library/dplyr)
    
    data.frame(str) %>%
      # create row ID:
      mutate(row = row_number()) %>%
      # split strings into separate words:
      separate_rows(str, sep = " ") %>%
      # for each `row`:
      group_by(row) %>%
      # create two new columns:
      mutate(
        # add a column with the run-length number enclosed in "[]":
        add = paste0("[", cumsum(rleid(row)), "]"),
        # paste the separate names and the values in `add` together:
        str_0 = paste0(str, add)) %>%
      # put everything back onto the original rows:
      summarise(str = paste0(str_0, collapse = "")) %>%
      # deactivate grouping:
      ungroup() %>%
      # remove string-final `[...]`: 
      mutate(str = sub("\\[\\d+\\]$", "", str))
    

    Result:

    # A tibble: 3 × 2
        row str                        
      <int> <chr>                      
    1     1 Claire[1]Randall[2]Fraser  
    2     2 Peter[1]Dough              
    3     3 Orson[1]Dude[2]Welles[3]Man