Search code examples
rtextsplit

Split text and build a table


After responding a questionnaire, I recieved the answers of it. One of the questions was: how often do you use these languages in your work? The answers are in this format:

"A - Spanish 60 \r\nB - Both of them 10 \r\n C - English 30"
"B - Both of them 50 \r\n C - English 50"
"A - Spanish 30 \r\nC - English 70"

As you can see, each answer is formed by three different answers, preceding by A, B or C(Spanish, Both of them or English). Nevertheless, all three answers do not always appear, and what I would like to get is the following table:

Spanish | Both of them | English
   60          10           30
    0          50           50
   30           0           70

With strsplit(x, "\r\n") I separated the answers but I do not know how to continue.


Solution

  • Let me share my insights on achieving this:

    # Pseudocode
    # 1. Init: an empty matrix result,
    #          #of rows equal to the number of responses
    #          3 columns for Spanish, Both of them, and English.
    # 2. for each response on strsplit(response, "\r\n") and extra spaces removed.
    #      2.1. for each line, split it into parts using strsplit(line, " "), and
    #           extract the option (A, B, or C) and the value.
    #           2.1.1. Based on the option, update the corresponding cell in the result matrix.
    # 3. Print the result matrix.
    

    Below is the sample code:

    # init:
    result <- matrix(0, nrow=length(responses), ncol=3)
    colnames(result) <- c("Spanish", "Both of them", "English")
    
    for (i in seq_along(responses)) {
      response <- responses[i]
      lines <- strsplit(response, "\r\n")[[1]]
      
      for (line in lines) {
        line <- gsub("^\\s+|\\s+$", "", line)  # <-- Remove extra spaces
        parts <- strsplit(line, " ")[[1]]
        option <- parts[1]
        value <- as.numeric(parts[length(parts)])
        column_name <- switch(option,
                              "A" = "Spanish",
                              "B" = "Both of them",
                              "C" = "English")
        result[i, column_name] <- value
      }
    }
    

    Demo here