Search code examples
rdplyracross

How to handle non equally-sized data frames


In dplyr/across, below code show error "...defined for equally-sized data frames". This caused by no profit_2023 to match with existing variable profit_2022.How to handle it ?(I want the no match the result and return NA or blank)

library(tidyverse)

raw_data_2 <- data.frame(category=c('A','B','C'),
                       sales_2022=c(1,2,3),
                       profit_2022=c(0.1,0.4,0.3),
                       margin_2022=c(0.2,0.8,0.6),
                       sales_2023=c(1.5,3,4),
                       margin_2023=c(0.3,0.7,0.5)
                       )

raw_data_2 %>% mutate((across(contains("_2023"),.names="{col}_increase") / across(contains("_2022"))-1))

Solution

  • The main issue with your data is that it's quite untidy. Clean the data, and it becomes a lot more straightforward:

    raw_data_2 |>
      pivot_longer(cols = -category, names_to = c(".value", "year"), names_sep = "_") |> 
      mutate(across(c(sales, profit, margin), \(x) x/lag(x)-1, .names = "{.col}_increase"), .by = category)
    

    Output:

    # A tibble: 6 × 8
      category year  sales profit margin sales_increase profit_increase
      <chr>    <chr> <dbl>  <dbl>  <dbl>          <dbl>           <dbl>
    1 A        2022    1      0.1    0.2         NA                  NA
    2 A        2023    1.5   NA      0.3          0.5                NA
    3 B        2022    2      0.4    0.8         NA                  NA
    4 B        2023    3     NA      0.7          0.5                NA
    5 C        2022    3      0.3    0.6         NA                  NA
    6 C        2023    4     NA      0.5          0.333              NA
    # ℹ 1 more variable: margin_increase <dbl>
    

    Notes:

    1. Untidy data is hard to use, and narrows your options until you're forced into doing things the jankiest way possible. There's a bunch of reasons why keeping your data tidy is a good idea (you can find many here), but maybe the best one is your own sanity!
    2. The profit increase column is all NAs, because the 2023 profit rows are all NAs, and there is no one prior to 2022, so the 2022 ones are NAs also.