Search code examples
rcaser-factor

Mutate factor to new variable in case_when() statement


Data Setup

I have a data set that appears somewhat like this simple dataframe below:

CAD_EXCHANGE <- 1.34
EUR_EXCHANGE <- 0.88
 
df <- tibble(
  shipment = c("A", "B", "C", "D", "E"),
  invoice = c(rep(500, 5)),
  currency = factor(c("USD", "EUR", "CAD", NA, "SDD"))
)
 
df
# A tibble: 5 x 3
  shipment invoice currency
  <chr>      <dbl> <fct>   
1 A            500 USD     
2 B            500 EUR     
3 C            500 CAD     
4 D            500 NA      
5 E            500 SDD     

levels(df$currency)
[1] "CAD" "EUR" "SDD" "USD"

End Goal

I am trying to convert the invoices to USD for some common other currencies (EUR and CAD), but not all of them or if data is missing (i.e., SDD and NA). My final data frame should look like this:

# A tibble: 5 x 5
  shipment invoice currency invoice_converted currency_converted
  <chr>      <dbl> <fct>                <dbl> <fct>             
1 A            500 USD                    500 USD               
2 B            500 EUR                    568 USD               
3 C            500 CAD                    373 USD               
4 D            500 NA                     500 NA                
5 E            500 SDD                    500 SDD               

Trial 1 -- Does Not Work

In the future, I may have more than just these few currencies to convert, so I applied a case_when() statement. This was my first attempt:

df_USD1 <- df %>%
  mutate(
    invoice_converted = case_when(
      currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
      currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
      TRUE ~ invoice
    ),
    currency_converted = case_when(currency == "EUR" ~ "USD",
                                   currency == "CAD" ~ "USD",
                                   TRUE ~ currency)
  )

Error: Problem with `mutate()` column `currency_converted`.
i `currency_converted = case_when(...)`.
x must be a character vector, not a `factor` object.

With the above, I understand that I'm mixing character and factor in the assignment to currency_converted because I have the default TRUE ~ currency (and currency is a factor). So I tried using only factors for the assignment...

Trial 2 -- Works, but not in a reliable way

df_USD2 <- df %>%
  mutate(
    invoice_converted = case_when(
      currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
      currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
      TRUE ~ invoice
    ),
    currency_converted = case_when(
      currency == "EUR" ~ currency[1],
      currency == "CAD" ~ currency[1],
      TRUE ~ currency)
  )

It works, but only because in my setup for this question, USD is in the first position, and I cannot rely on that.

> df$currency
[1] USD  EUR  CAD  <NA> SDD 
Levels: CAD EUR SDD USD

Trial 3 -- Doesn't work

I thought I could try some other way of getting at the factor with subsetting, but this doesn't work:

df_USD3 <- df %>%
  mutate(
    invoice_converted = case_when(
      currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
      currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
      TRUE ~ invoice
    ),
    currency_converted = case_when(
      currency == "EUR" ~ df$currency[df$currency == "USD"],
      currency == "CAD" ~ df$currency[df$currency == "USD"],
      TRUE ~ currency
    )
  )

Error: Problem with `mutate()` column `currency_converted`.
i `currency_converted = factor(...)`.
x `currency == "EUR" ~ df$currency[df$currency == "USD"]`, `currency == "CAD" ~ df$currency[df$currency == "USD"]` must be length 5 or one, not 2.
Run `rlang::last_error()` to see where the error occurred.

And it seems to be that it's because of the NA that gets returned...

> df$currency[df$currency == "USD"]
[1] USD  <NA>
Levels: CAD EUR SDD USD

...because if I go back to my original df and replace that NA with some other currency, it would work -- but obviously I need to be able to keep NA where it belongs.

I feel like there's some very good way to do this, but I'm missing it despite reading up on factors and trying different things. Help?


Solution

  • case_when doesn't do the type conversion automatically - i.e. currency is factor whereas the returns from other conditions in case_when is just character. So, we can force convert the currency to character to make all the returns same class and it should work

    library(dplyr)
    df %>%
      mutate(
        invoice_converted = case_when(
          currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
          currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
          TRUE ~ invoice
        ), currency_converted = case_when(currency == "EUR" ~ "USD",
                                       currency == "CAD" ~ "USD",
                                       TRUE ~ as.character(currency)))
    

    -output

    # A tibble: 5 × 5
      shipment invoice currency invoice_converted currency_converted
      <chr>      <dbl> <fct>                <dbl> <chr>             
    1 A            500 USD                    500 USD               
    2 B            500 EUR                    568 USD               
    3 C            500 CAD                    373 USD               
    4 D            500 <NA>                   500 <NA>              
    5 E            500 SDD                    500 SDD             
    

    If we want to keep it as a factor, either wrap with factor after the case_when or directly use fct_recode instead of case_when

    library(forcats)
    df %>%
      mutate(
        invoice_converted = case_when(
          currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
          currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
          TRUE ~ invoice
        ), currency_converted = fct_recode(currency, USD = "EUR", USD = "CAD"))
    

    -output

    # A tibble: 5 × 5
      shipment invoice currency invoice_converted currency_converted
      <chr>      <dbl> <fct>                <dbl> <fct>             
    1 A            500 USD                    500 USD               
    2 B            500 EUR                    568 USD               
    3 C            500 CAD                    373 USD               
    4 D            500 <NA>                   500 <NA>              
    5 E            500 SDD                    500 SDD