Search code examples
rstring-concatenation

R function `paste` is inverting the order of its arguments


I am simply trying to print the output of a statistical process I'm running in R, and I find that the paste function appears to be inverting its inputs. Here's a MWE:

df = data.frame(p_values=c(0.01, 0.001, 0.1, 0.01))
min_index = which.min(df$p_values)
print(paste(str(min_index), 
            " with p-value ",
            str(df$p_values[min_index])))

I'm expecting output more-or-less like this:

2 with p-value 0.001

Instead, I'm getting the highly unintuitive result

 int 2
 num 0.001
[1] "  with p-value  "

In addition to the unexpected order of the printing, I'm getting the int and num and [1], as well as not all being on one line.

What's going on here? I had thought paste was nearly a drop-in replacement for Python's concat operator +, but this result has me scratching my head.


Solution

  • structure not string

    str() in R is not the same as the Python str() function, which coerces an object to a string. In R, the str() function exists to:

    Compactly display the internal structure of an R object, a diagnostic function and an alternative to summary (and to some extent, dput).

    This means when you do str(min_index), R tells you that the structure of min_index is that it's an integer vector of length one, and the value of its element is 2.

    The equivalent R function to Python's str() would be toString(), or perhaps as.character(). However, in general R is more forgiving about types than Python. paste() and all other string concatenation or printing commands I can think of will coerce numbers to strings for you, so you can just do:

    paste(min_index, "with p-value", df$p_values[min_index])
    # [1] "2  with p-value 0.001"
    

    Note that I deleted your spaces, as by default paste() adds one, though that can be changed by supplying a different sep argument or using paste0().

    Why the output order seems inverted

    Accounting for this, you might expect your output to be "int 2 with p-value num 0.001". However, it is:

     int 2
     num 0.001
    [1] "  with p-value  "
    

    This is because str() prints its output and returns nothing:

    x  <- str(1) # "num 1" is printed
    print(x) # NULL
    

    Your command can basically be interpreted as:

    str(min_index) # int 2
    str(df$p_values[min_index]) # num 0.001
    paste(NULL, "with p-value", NULL) # [1] " with p-value "
    

    This why it prints in the order you see.

    A note on displaying output

    As Onyambu says in the comments, if you want to display the output without the [1] you can use cat():

    cat(
      min_index, "with p-value", df$p_values[min_index],
      fill = TRUE
    )
    # 2 with p-value 0.001
    

    Note that you need fill = TRUE to format this correctly, including a new line at the end. Depending on the purpose of your output, you may also want to look at message().