Search code examples
rlogistic-regressionglm

How to shift the predictor in a glm function by one row in R?


I'm looking at correlation between different species presence at different times using a glm. Right now I have a dataset with hours and precense/absence (1/0) of different species during those hours. It looks something like this (very simplified of course):

Hour  Giraffe Elephant Lion
0       1       1       0
1       0       1       1
2       1       0       0
3       1       1       0
4       0       1       0
5       0       0       1

I use a model that looks somewhat like this:

giraffe_glm.fit <- glm(giraffe ~ scale(elephants) + scale(hour) + scale(lion), data = data_num, family = binomial)

But this ofcourse compare the sightings during the same hour (the sightings of giraffe during hour 1 to the sightings of elephant and lion during hour 1). But I want to look at the sightings compared to the hour before (ex the sightings of giraffe during hour 1 to the sightings of elephants and lions during hour 0)

I could try to add shifted columns to the dataset like this:

Hour  Giraffe Elephant Lion Elephant-1 Lion-1
0       1       1       0       NA       NA
1       0       1       1       1       0
2       1       0       0       1       1
3       1       1       0       0       0
4       0       1       0       1       0
5       0       0       1       1       0

and use them in the model, but i have a lot of variables and I hoped to find a way to work it into the model directly instead, like glm(giraffe ~ scale(elephants-1) + scale(hour) + scale(lion-1), data = data_num, family = binomial) or something simillar. Is there a way to do this?


Solution

  • Here is how we could do it. @dufei already suggests to use lag()

    data_lagged <- data %>%
      mutate(
        Elephant_lag = lag(Elephant),
        Lion_lag = lag(Lion)
      )
    
    glm_fit <- glm(Giraffe ~ scale(Elephant_lag) + scale(Lion_lag), data = data_lagged, family = binomial)
    summary(glm_fit)
    
    Call:
    glm(formula = Giraffe ~ scale(Elephant_lag) + scale(Lion_lag), 
        family = binomial, data = data_lagged)
    
    Deviance Residuals: 
             2           3           4           5           6  
    -6.547e-06   6.547e-06   6.547e-06  -6.547e-06  -6.547e-06  
    
    Coefficients:
                         Estimate Std. Error z value Pr(>|z|)
    (Intercept)            -4.913  58589.772       0        1
    scale(Elephant_lag)   -21.973  67653.634       0        1
    scale(Lion_lag)        21.973  67653.664       0        1
    
    (Dispersion parameter for binomial family taken to be 1)
    
        Null deviance: 6.7301e+00  on 4  degrees of freedom
    Residual deviance: 2.1434e-10  on 2  degrees of freedom
      (1 Beobachtung als fehlend gelöscht)
    AIC: 6
    
    Number of Fisher Scoring iterations: 23