Search code examples
rtransformationsapplycbindtibble

How to keep all variables using sapply?


I have a tbl (EU_28), the head of which looks like this:

    # A tibble: 5 x 22
  `Member State` `1997` `1998` `1999` `2000` `2001` `2002` `2003` `2004` `2005` `2006` `2007`
  <chr>           <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Belgium        161621 171026 165557 179381 174181 173824 181110 187889 206539 218941 236320
2 Bulgaria           NA     NA     NA     NA  20192  20390  21358  23125  24841  27513  24900
3 Croatia            NA     NA     NA  16886  19056  18584  20320  25246  26201  26325  30097
4 Cyprus             NA     NA     NA     NA     NA   7220   7258   6837   7305   7676   7516
5 Denmark        124010 104966  97213  96533  93972  94283 103954 100373  99688 107674 109660
# ... with 10 more variables: `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>,
#   `2012` <dbl>, `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>

I would like to transform the numeric value of each variable ('000s of tonnes), on every row, except for the first variable 'Member State', by dividing each by 1000 to convert to 'millions of tonnes'.

So far I have tried using sapply() in conjunction with my own function to transform each column.

First I created a 'convert_to_millions' function:

convert_to_millions <- function(x){x/1000}

I then applied this to EU_28 using sapply():

EU_28_Mtonnes <- tbl_df(sapply(EU_28, convert_to_millions))
Error in x/1000 : non-numeric argument to binary operator
Called from: FUN(X[[i]], ...)

This threw an error that I don't understand so I tried applying sapply() only to the numeric variables of my tbl:

EU_28_Mtonnes <- tbl_df(sapply(EU_28[,2:length(EU_28)], convert_to_millions))

> head(EU_28_Mtonnes)
# A tibble: 6 x 21
  `1997` `1998` `1999` `2000` `2001` `2002` `2003` `2004` `2005` `2006` `2007` `2008` `2009`
   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1   162.   171.  166.   179.   174.  174.   181.   188.   207.   219.   236.   244.   203.  
2    NA     NA    NA     NA     20.2  20.4   21.4   23.1   24.8   27.5   24.9   26.6   21.9 
3    NA     NA    NA     16.9   19.1  18.6   20.3   25.2   26.2   26.3   30.1   29.2   23.4 
4    NA     NA    NA     NA     NA     7.22   7.26   6.84   7.30   7.68   7.52   7.96   6.81
5   124.   105.   97.2   96.5   94.0  94.3  104.   100.    99.7  108.   110.   106.    90.6 
6    NA     NA    NA     NA     40.4  44.7   47.0   44.8   46.5   50.0   45.0   36.2   38.5 
# ... with 8 more variables: `2010` <dbl>, `2011` <dbl>, `2012` <dbl>, `2013` <dbl>,
#   `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>

I succeeded in transforming each variable as wanted but I lost the first, crucial, variable 'Member State'.

How can I keep the first variable whilst using sapply()? Is this even the correct method to use in the first place?


Solution

  • You can use mutate_if in dplyr to specify to only perform the conversion on variables meeting certain conditions (that it is a number for example):

    iris %>% mutate_if(is.numeric,funs(./1000))
        Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
    1         0.0051      0.0035       0.0014      0.0002     setosa
    2         0.0049      0.0030       0.0014      0.0002     setosa
    3         0.0047      0.0032       0.0013      0.0002     setosa
    ...