I have a tbl (EU_28), the head of which looks like this:
# A tibble: 5 x 22
`Member State` `1997` `1998` `1999` `2000` `2001` `2002` `2003` `2004` `2005` `2006` `2007`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Belgium 161621 171026 165557 179381 174181 173824 181110 187889 206539 218941 236320
2 Bulgaria NA NA NA NA 20192 20390 21358 23125 24841 27513 24900
3 Croatia NA NA NA 16886 19056 18584 20320 25246 26201 26325 30097
4 Cyprus NA NA NA NA NA 7220 7258 6837 7305 7676 7516
5 Denmark 124010 104966 97213 96533 93972 94283 103954 100373 99688 107674 109660
# ... with 10 more variables: `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>,
# `2012` <dbl>, `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>
I would like to transform the numeric value of each variable ('000s of tonnes), on every row, except for the first variable 'Member State', by dividing each by 1000 to convert to 'millions of tonnes'.
So far I have tried using sapply() in conjunction with my own function to transform each column.
First I created a 'convert_to_millions' function:
convert_to_millions <- function(x){x/1000}
I then applied this to EU_28 using sapply():
EU_28_Mtonnes <- tbl_df(sapply(EU_28, convert_to_millions))
Error in x/1000 : non-numeric argument to binary operator
Called from: FUN(X[[i]], ...)
This threw an error that I don't understand so I tried applying sapply() only to the numeric variables of my tbl:
EU_28_Mtonnes <- tbl_df(sapply(EU_28[,2:length(EU_28)], convert_to_millions))
> head(EU_28_Mtonnes)
# A tibble: 6 x 21
`1997` `1998` `1999` `2000` `2001` `2002` `2003` `2004` `2005` `2006` `2007` `2008` `2009`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 162. 171. 166. 179. 174. 174. 181. 188. 207. 219. 236. 244. 203.
2 NA NA NA NA 20.2 20.4 21.4 23.1 24.8 27.5 24.9 26.6 21.9
3 NA NA NA 16.9 19.1 18.6 20.3 25.2 26.2 26.3 30.1 29.2 23.4
4 NA NA NA NA NA 7.22 7.26 6.84 7.30 7.68 7.52 7.96 6.81
5 124. 105. 97.2 96.5 94.0 94.3 104. 100. 99.7 108. 110. 106. 90.6
6 NA NA NA NA 40.4 44.7 47.0 44.8 46.5 50.0 45.0 36.2 38.5
# ... with 8 more variables: `2010` <dbl>, `2011` <dbl>, `2012` <dbl>, `2013` <dbl>,
# `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>
I succeeded in transforming each variable as wanted but I lost the first, crucial, variable 'Member State'.
How can I keep the first variable whilst using sapply()? Is this even the correct method to use in the first place?
You can use mutate_if
in dplyr to specify to only perform the conversion on variables meeting certain conditions (that it is a number for example):
iris %>% mutate_if(is.numeric,funs(./1000))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 0.0051 0.0035 0.0014 0.0002 setosa
2 0.0049 0.0030 0.0014 0.0002 setosa
3 0.0047 0.0032 0.0013 0.0002 setosa
...