Search code examples
rdatabaserow

How to fix "replacement has x rows, data has z" in R


I have this dataset which includes all the sales for a company in a given year (company code = gvkey, year = fyeqarq, sales = realsales). After calculating the yearly growth rates for realsales, I try to insert them into the df. For some reason, I've been getting the following error message "Error in $<-.data.frame(*tmp*, growth_rate, value = c(10041 = NA, : replacement has 204072 rows, data has 204024" when doing so.

I already attempted to remove all NA values and other solutions found in this forum, but unfortunately, none of them worked.

The code fragment which is yielding this error:

rs <- rs[order(rs$gvkey, rs$fyearq, rs$realsales),]


table(is.na(rs$realsales))


rs <- rs %>%

  group_by(gvkey) %>%
  filter(!any(is.na(realsales))) %>%
  ungroup()
rs$growth_rate <- NA

growth_rate <-function(x){
  out <- c(NA,  x[2:length(x)]/ x[1:(length(x)-1)])
  return(out)
}
rs$growth_rate <- do.call("c", by(rs$realsales,rs$gvkey, growth_rate))

It does create a value with all the 204072 elements if I only run

growth_rate <- do.call("c", by(rs$realsales,rs$gvkey, growth_rate))

I don't know if it points to anything but thought it was worth mentioning.

Everything works until it reaches the last line.

Another important thing to point out is, this wasn't happening with the previous dataset. I have changed it a bit to have more observations than the previous one, but it is actually the same, just bigger. Only now I am getting this error. One difference is that I have merged two data frames in order to convert nominal sales to real sales, something I have not done in the previous one. Segment where I do this:

df.gdpdeflator <- read.table("gdpdeflator.txt", header=TRUE)

real_sales <- left_join(sumofsalesbyfirm, df.gdpdeflator, by = "fyearq")
real_sales$realsales <- real_sales$saley/(real_sales$deflator/100)
rs <- aggregate(realsales~gvkey+fyearq, real_sales, sum)

Let me know if further information is required, I'll be happy to provide it.


Solution

  • Use of 2:length(x) works fine as long as your x is length 2 or more. I believe your intent for that is to get all but the first, in which case all of these work:

    x <- 1:10
    x[-1]
    x[ seq_len(length(x))[-1] ]
    tail(x, n=-1)
    # [1]  2  3  4  5  6  7  8  9 10
    

    Let me formalize this a little to show several options (wrong and right) and show some output.

    allbutfirst <- function(n) {
      sapply(list(
        wrong1 = 2:length(n),
        wrong2 = n[ 2:length(n) ],
        right1 = n[ -1 ],
        right2 = n[ seq_len(length(n))[-1] ],
        right3 = tail(n, n=-1)
      ), paste, collapse = ",")
    }
    
    allbutlast <- function(m) {
      sapply(list(
        wrong1 = 1:(length(m)-1),
        wrong2 = m[ 1:max(0, length(m)-1) ],
        right1 = m[ -length(m) ],
        right2 = m[ seq_len(max(0, length(m) - 1)) ],
        right3 = head(m, n=-1)
      ), paste, collapse = ",")
    }
    allbutfirst(1:5)
    #    wrong1    wrong2    right1    right2    right3 
    # "2,3,4,5" "2,3,4,5" "2,3,4,5" "2,3,4,5" "2,3,4,5" 
    cat(paste(allbutfirst(1:5), collapse = "\n"))
    # 2,3,4,5
    # 2,3,4,5
    # 2,3,4,5
    # 2,3,4,5
    # 2,3,4,5
    cat(paste(allbutfirst(1), collapse = "\n"))
    # 2,1
    # NA,1
    # 
    # 
    # 
    

    (The wrong labels are there because they go wrong when the length is not 2 or more ...)

    The "2,3,4,5" means the returned vector is length four, iterating from 2 to 5. The "2,1" means length two, decrementing from 2 to 1 (when we did not mean to do so). Of course, the NA is just not right.

    The empty rows there are relevant: they mean that there were fewer than 2, and nothing was returned (which is what we want). To call out the empty strings, I'll replace them with "", just for show. But they are empty, as they should be.

    So this "table" denotes the different methods

                                allbutfirst(x)     allbutlast(x)
    
    x <- 1:5         wrong1     2,3,4,5            1,2,3,4
                     wrong2     2,3,4,5            1,2,3,4
                     right1     2,3,4,5            1,2,3,4
                     right2     2,3,4,5            1,2,3,4
                     right3     2,3,4,5            1,2,3,4
    
    

    So far so good, no harm yet.

                                allbutfirst(x)     allbutlast(x)
    
    x <- 1           wrong1     2,1                1,0            <-- length 2, expected none
                     wrong2     NA,1               1              <-- 2 or 1, expected 0
                     right1     ""                 ""   
                     right2     ""                 ""   
                     right3     ""                 ""
    
    x <- integer(0)  wrong1     2,1,0              1,0,-1         <-- length 3? negative?
                     wrong2     NA,NA              NA             <-- all wrong
                     right1     ""                 ""
                     right2     ""                 ""
                     right3     ""                 ""
    

    Moral of the story:

    • use of head and tail with negative counts works well
    • use of x[-1] and x[-length(x)] is equivalent, and still works well
    • seq_len(max(0, ...)) is a safe way of doing things; seq_len(0) will always be empty, 1:0 will not.