How can I parse a variable into multiple columns according to multiple conditions in R?

I'm new to R, so please bear with me. I am looking at incarceration data, and have a variable conviction, which is a messy string that looks like this:

[1] "Ct. 1: Conspiracy to distribute"                                                                         
[2] "Aggravated Assault"                                                                                      
[3] "Ct. 1: Possession of prohibited object; Ct. 2: criminal forfeiture"                                      
[4] "Ct. 1-6: Human Trafficking; Cts. 7, 8 Unlawful contact; Ct. 11: Involuntary Servitude; Ct. 36: Smuggling"

Ideally, I want to do two things. First, I want to parse on Ct. into multiple columns. For the first three rows, the data would look like this:

     convictions                              conviction_1                      conviction_2                    
[1,] "Ct. 1: Conspiracy to distribute"        "Conspiracy to distribute"        NA                   
[2,] "Aggravated Assault"                     "Aggravated Assault"              NA                   
[3,] "Ct. 1: Possession of prohibited object" "Possession of prohibited object" "criminal forfeiture"

but things get hairy when I get to the third row, because I would want to parse the first part of the string(Ct. 1-6: Human Trafficking) into 6 columns, and then Ct. 7,8: Unlawful contact into 2 more columns.

The second part is that I then want to generate a variable convictions_total which would find the highest number in the conviction string that follows after Ct:. for the three example entries I included here, convictions_total would look like:

[1]  1  2 36

This is the code I used to parse a much more straight-forward string variable, but I'm unsure how to tweak it for this variable:

cols <- data.frame(str_split_fixed(data$convictions`,",",Inf))
colnames(cols) <- paste0("conviction_",rep(1:length(cols)))
data <- cbind(data,cols)

Thank you in advance!

Solution

After going down a two-day rabbit hole, I figured out a tidy version of @LMc's code, which ended up working better because calling plyr was messing up other code I had written:

test_data <- 
  tibble(id = 1:5, 
         convictions = c("Ct. 1: Conspiracy to distribute"    ,                                                                     
                         "Aggravated Assault"              ,                                                                        
                         "Ct. 1: Possession of prohibited object; Ct. 2: criminal forfeiture"  ,                                    
                         "Ct. 1-6: Human Trafficking; Cts. 7, 8 Unlawful contact; Ct. 11: Involuntary Servitude; Ct. 36: Smuggling 50 grams",
                         "Ct. 1: Conspiracy; Cts. 2-7: Wire Fraud; Cts. 8-28:  Money Laundering"))
test_data <- test_data %>% 
  mutate(c2 = convictions) #this just duplicates the original variable convictions because I want to preserve it

test_data <- test_data %>%
  separate_rows(c2, sep = ";") %>%
  mutate(c2 = str_remove(c2, "Ct(s)?(\\. )(\\d|-|:|,|\\s)+")) %>%
  group_by(id) %>%
  mutate(conviction_number = paste0("c_", row_number())) %>%
  pivot_wider(values_from = c2, names_from = conviction_number) 


test_data <- test_data %>% 
  mutate(c2 = convictions) #again, just preserving the original variable

test_data <- test_data %>%
  separate_rows(c2, sep = ";") %>% 
  mutate(total_counts = as.numeric(ifelse(is.na(str_extract(c2, "((?<=\\-)\\d+)")), str_extract(c2, "\\d+"), str_extract(c2, "((?<=\\-)\\d+)")))) %>% 
  mutate(total_counts = ifelse(is.na(total_counts), 1, total_counts)) %>% 
  group_by(id) %>% 
  slice_max(total_counts)

which produces the following dataframe:

     id convictions                                                  c_1                c_2           c_3            c_4          c2                 total_counts
  <int> <chr>                                                        <chr>              <chr>         <chr>          <chr>        <chr>                     <dbl>
1     1 Ct. 1: Conspiracy to distribute                              Conspiracy to dis~  NA            NA             NA          "Ct. 1: Conspirac~            1
2     2 Aggravated Assault                                           Aggravated Assault  NA            NA             NA          "Aggravated Assau~            1
3     3 Ct. 1: Possession of prohibited object; Ct. 2: criminal for~ Possession of pro~ " criminal f~  NA             NA          " Ct. 2: criminal~            2
4     4 Ct. 1-6: Human Trafficking; Cts. 7, 8 Unlawful contact; Ct.~ Human Trafficking  " Unlawful c~ " Involuntary~ " Smuggling~ " Ct. 36: Smuggli~           36
5     5 Ct. 1: Conspiracy; Cts. 2-7: Wire Fraud; Cts. 8-28:  Money ~ Conspiracy         " Wire Fraud" " Money Laund~  NA          " Cts. 8-28:  Mon~           28

The first chunk of code parses the counts into separate rows, and then pivots back to the c_ columns. The second code chunk does the same parsing, but then looks across each entry to parse out the digits, instead of the words.

//d+ looks for any digit, but it turns out I had data that looked like Cts. 2-7 where I wanted the value 7, and not 2.

((?<=\\-)\\d+)")) Looks for the hyphen, and then parses the digits after it. If there is no hyphen, it defaults back to \\d+.

Finally, slice_max collapses the data down to 1 entry per ID based on the highest value of total_counts.