I am trying to write a function which uses gsub to find and replace certain strings. I have 10+ excel spreadsheets with a list of monetary values using standard Swedish separators for thousands and decimals, i.e. a decimal is noted by a comma and not a period, and the thousands separator is a space and not a comma. What in English would be 1,000.31 becomes 1 000,31 in Swedish? I am trying to write a function that will eliminate the space where the thousands separator is, and find the comma and replace it with a period.
Normal gsub functions work fine:
df$Saldo <- gsub(",", ".", df$Saldo)
df$Saldo <- gsub(" ", "", df$Saldo)
However, I will need to do this on 10+ dataframes, so I thought it would be more efficient to write a function that could be applied to each dataframe (The function I have tried to write plus a reproducible example are provided below). This is where I run into problems. I should also mention I am looking for a dplyr solution. The error I get with the function (below) is Error: Quosures can only be unquoted within a quasiquotation context.
I'll admit that, despite having read and reread about non-standard evaluation and quasi-quotation, I'm still having difficulty understanding these concepts. I have read Programming with dplyr
https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html.
I have also read:
https://rlang.r-lib.org/reference/quasiquotation.html.
Finally, I have studied several vaguely similar problems/solutions on stack overflow, but the solutions there don’t seem to work for me. Most of the problems deal with getting the regular expression to work, and not getting regular expressions to work inside a function.
Here is the function, such as it is:
library(dplyr)
replace_commas <- function(df1, c_name) {
c_name <- enquo(c_name)
df1 <- df1 %>% gsub(",", ".",!! c_name)
return(df1)
}
And here is a reproducible example:
df <- data.frame(Date = c("2018-12-15", "2018-11-04", "2018-10-26"), Saldo = c("123 432,45", "87 546,23", "9 564,82"))
df_new <- replace_commas(df, Saldo)
The immediate thing I want to achieve is a function that finds and replaces commas with periods. However, I would also appreciate a pointer on how to also include a regular expression in the function for removing unneeded spaces.
The following version of your code works to remove commas and replace them with decimals and also to remove spaces but you might want to use R's builtin facilities instead as shown further down.
library(dplyr)
as.swedish <- function(from) as.numeric(gsub(" ", "", sub(",", ".", from)))
replace_commas <- function(df1, c_name) {
c_name <- enquo(c_name)
df1 %>% mutate_at(vars(!!c_name), as.swedish)
}
df %>% replace_commas(Saldo)
The following assume that there are spaces as thousands separators and commas as decimals except 1a which only assumes commas as decimals.
1) read.csv2 We can define a swedish
class and then use it in read.csv2
. This uses the one-line function as.swedish
defined above.
# test data
Lines <- "Letter;Number\nA;1 200,3\nB;32\nC;2511,01"
# define swedish class
setClass("swedish")
setAs("character", "swedish", as.swedish)
setAs("factor", "swedish", as.swedish)
read.csv2(text = Lines, colClasses = list(Number = "swedish"))
giving:
Letter Number
1 A 1200.30
2 B 32.00
3 C 2511.01
1a) If we don't have spaces as thousands separators and only have commas as decimals then we can use read.csv2:
Lines2 <- "Letter;Number\nA;1200,3\nB;32\nC;2511,01"
read.csv2(text = Lines2)
giving:
Letter Number
1 A 1200.30
2 B 32.00
3 C 2511.01
2) convert df or using swedish
class from above to convert Saldo
in df
:
transform(df, Saldo = as(Saldo, "swedish"))
giving:
Date Saldo
1 2018-12-15 123432.45
2 2018-11-04 87546.23
3 2018-10-26 9564.82
2a) This also works and only uses only the as.swedish
function from above but not the swedish
class.
transform(df, Saldo = as.swedish(Saldo))
or almost the same with dplyr df %>% mutate(Saldo = as.swedish(Saldo))
3) function to convert df or define this function where names
is a character vector of column names to convert from Swedish. Neither quasi-quotation nor corresponding builtin R facilities are needed if we pass the names as character strings which seems like a better design anyways. This uses the as.swedish
function from above but not the swedish
class.
swedish <- function(data, names) replace(data, names, lapply(data[names], as.swedish))
swedish(df, "Saldo")
or
library(dplyr)
df %>% swedish("Saldo")