Search code examples
rwrdswrds-compusat

How to create a cumulative variable that groups by PERMNO and arranges by date in R


I have a dataframe with variables from COMPUSTAT containing data on various accounting items, including SG&A expenses from different companies.

I want to create a new variable in the dataframe which accumulates the SG&A expenses for each company in chronological order. I use PERMNO codes as the unique ID for each company.

I have tried this code, however it does not seem to work:

crsp.comp2$cxsgaq <- crsp.comp2 %>%
  group_by(permno) %>%
  arrange(date) %>%
  mutate_at(vars(xsgaq), cumsum(xsgaq))

(xsgag is the COMPUSTAT variable for SG&A expenses)

Thank you very much for your help


Solution

  • Your example code is attempting write the entire dataframe crsp.comp2, into a variable crsp.comp2$cxsgaq.

    Usually the vars() function variables needs to be "quoted"; though in your situation, use the standard mutate() function and assign the cxsgaq variable there.

    crsp.comp2 <- crsp.comp2 %>%
      group_by(permno) %>%
      arrange(date) %>%
      mutate(cxsgaq = cumsum(xsgaq))
    

    Reproducible example with iris dataset:

    library(tidyverse)
    iris %>% 
      group_by(Species) %>% 
      arrange(Sepal.Length) %>% 
      mutate(C.Sepal.Width = cumsum(Sepal.Width))