Search code examples
rdataframe

Applying strsplit() on data.frame results in unexpected output


I have one dataframe an two functions:

My dataframe:

s_words <- c("one,uno", "two,dos", "three,tres", "four,cuatro")
n_nums <- c(10, 20, 30, 40)
df1 <- data.frame(n_nums, s_words) 
> df1
  n_nums     s_words
1     10     one,uno
2     20     two,dos
3     30  three,tres
4     40 four,cuatro

My two functions:

f_op1 <- function(s_input) {
  s_ret <- paste0("***", s_input, "***")
  return(s_ret)
}


f_op2 <- function(s_input) {
  a_segments <- unlist(strsplit(s_input, split="\\W+"))
  s_eng <- a_segments[1]
  s_spa <- a_segments[2]
  s_ret <- paste0("*", s_eng, "***", s_spa, "*")
  return(s_ret)
}

When I apply my functions on the dataframe ....

df1$s_op1 <- f_op1(df1$s_words)
df1$s_op2 <- f_op2(df1$s_words)

I get this:

> df1
  n_nums     s_words             s_op1       s_op2
1     10     one,uno     ***one,uno*** *one***uno*
2     20     two,dos     ***two,dos*** *one***uno*
3     30  three,tres  ***three,tres*** *one***uno*
4     40 four,cuatro ***four,cuatro*** *one***uno*

But I need this, something like:

> df1
  n_nums     s_words             s_op1           s_op2
1     10     one,uno     ***one,uno***     *one***uno*
2     20     two,dos     ***two,dos***     *two***dos*
3     30  three,tres  ***three,tres***  *three***tres*
4     40 four,cuatro ***four,cuatro*** *four***cuatro*

f_op2() is only for demonstration purposes, in reality it is more complex and uses strsplit.


Solution

  • strsplit() returns a list of vectors so we could use sapply() to extract the relevant part from each vector:

    f_op2 <- function(s_input) {
      a_segments = strsplit(s_input,split="\\W+")
      s_eng = sapply(a_segments, \(x) x[1])
      s_spa = sapply(a_segments, \(x) x[2])
      s_ret = paste0("*",s_eng,"***",s_spa,"*")
      return(s_ret)
    }