Search code examples
rloopsdataframeindexoutofboundsexception

Accessing data not existing in R dataframe


I have a dataframe named productCheck:

prod <- c("GAS","GAS","GLP","GLP","GNV")
monthYear <- c("2016-06-01","2016-07-01","2016-06-01","2016-07-01","2016-07-01")
meanValue <- c(3,5,8,1,6)
price <- c(0,0,0,0,0)

productCheck <- data.frame(prod,monthYear,meanValue,price)
productCheck$prod <- as.factor(productCheck$prod)
productCheck$monthYear <- as.factor(productCheck$monthYear)

When I execute the following loop, I get an error:

for (j in levels(productCheck$prod))
{
  firstPeriod <- NA
  for (k in levels(productCheck$monthYear))
  {
    if (!is.na(firstPeriod))
    {
      secondPeriod <- k
      productCheck[productCheck$monthYear==j & productCheck$prod==secondPeriod,]$price <- 
        100*(productCheck[productCheck$monthYear==secondPeriod & productCheck$prod==j,]$meanValue - 
             productCheck[productCheck$monthYear==firstPeriod & productCheck$prod==j ,]$meanValue) /
             productCheck[productCheck$monthYear==firstPeriod & productCheck$prod==j ,]$meanValue
    }
    firstPeriod <- k
  }  
}

Error in $<-.data.frame(*tmp*, "price", value = numeric(0)) : replacement has 0 rows, data has 1

The problem is that for GNV product there is no information for the period "2016-06-01". How can I avoid this error?


Solution

  • I feel your code is unnecessarily too long with for loops and problematic, as you have shown. I can see several alternatives, one of them is:

    library(tidyverse)
    productCheck %>% 
      pivot_wider(names_from =monthYear, values_from = meanValue) %>% 
      mutate(price = 100*(`2016-07-01` - `2016-06-01`)/`2016-06-01`)
    
    # A tibble: 3 x 4
      prod  price `2016-06-01` `2016-07-01`
      <fct> <dbl>        <dbl>        <dbl>
    1 GAS    66.7            3            5
    2 GLP   -87.5            8            1
    3 GNV    NA             NA            6
    
    

    Your original data:

    prod <- c("GAS", "GAS", "GLP", "GLP", "GNV")
    monthYear <- c("2016-06-01", "2016-07-01", "2016-06-01", "2016-07-01", "2016-07-01")
    meanValue <- c(3, 5, 8, 1, 6)
    productCheck <- data.frame(prod, monthYear, meanValue)