Search code examples
rvector

How can I extract values from multiple columns and join them in a single element, separated by text characters?


The question might be a bit cryptic, so let me explain. Here's a dataframe for illustration, with a country column, a column with a measurement, and two columns with the lower and upper bounds of a confidence interval calculated around that measurement.

country <- c('A','B','C','D','E')
var <- c(8,2,1,5,3)
CIlb <- c(7,1,1,4,2)
CIub <- c(9,3,1,4,1)
df <- cbind.data.frame(country, var, CIlb, CIub)

I am now trying to construct a dataframe with two columns: the country column, and a column that has "var (CIlb-CIub)". I then want to make a nice table from it with the flextable package.

I tried producing a vector that contains those values, but in order to join the data in this format I need to extract var, then switch to text characters for the space and the open bracket, then extract CIlb, then text again for the dash, then extract a value again, and then the closed bracket. Is there a way to do this?

In this sample, and in fact also in the data I am working with, it would be quicker to generate the vector by hand. But I have a deep distrust of copying values by hand, I'd need to generate the vectors and then check five times that I copied all the values correctly. Plus, at some point I might work with bigger dataframes that would make that job really tedious.

Is there a way to do what I want to do?


Solution

  • You can insert data into a formatted string with sprintf:

    df$new_col <- with(df, sprintf('%0.1f (%0.1f - %0.1f)', var, CIlb, CIub))
    
      country var CIlb CIub         new_col
    1       A   8    7    9 8.0 (7.0 - 9.0)
    2       B   2    1    3 2.0 (1.0 - 3.0)
    3       C   1    1    1 1.0 (1.0 - 1.0)
    4       D   5    4    4 5.0 (4.0 - 4.0)
    5       E   3    2    1 3.0 (2.0 - 1.0)
    

    For variances and confidence intervals, I would usually display these as floats rounded to some number of decimal places (%0.1f here), but you could also do integers:

    df$new_col <- with(df, sprintf('%i (%i - %i)', var, CIlb, CIub))
    
      country var CIlb CIub   new_col
    1       A   8    7    9 8 (7 - 9)
    2       B   2    1    3 2 (1 - 3)
    3       C   1    1    1 1 (1 - 1)
    4       D   5    4    4 5 (4 - 4)
    5       E   3    2    1 3 (2 - 1)