Search code examples
rdplyrcountindicator

How to count how often indicator variables change across variables?


I have a data frame df of car IDs and whether this ID entered and exited a parking garage in a given city in a given year:

ID City day1 day2 day3 day4 day5 day6 day7
1  NYC  1    0    1    1    0    1    1
2  BOS  0    1    1    0    0    1    0
3  SFO  1    1    0    0    0    0    0
4  SFO  1    1    1    1    1    1    1
4  NYC  1    0    0    1    0    1    1

What I want is a measure of the number of times a car entered and exited a given garage in a city across all days. For id == 1 this would mean: day1->day2 gives 1 exit, day2->day3 gives 1 entry, day3->day4 gives no entries or exits, day4->day5 gives 1 exit, day5->day6 gives one 1 entry, day6->day7 no entries or exits. In total that means 2 entries and 2. I would like these two numbers in my table:

ID City day1 day2 day3 day4 day5 day6 day7 entries exits
1  NYC  1    0    1    1    0    1    1    2       2
2  BOS  0    1    1    0    0    1    0    2       2
3  SFO  1    1    0    0    0    0    0    0       1
4  SFO  1    1    1    1    1    1    1    0       0
4  NYC  1    0    0    1    0    1    1    2       2

In other words, entries = count(0->1), exits = count(1->0)

Does anyone know how I can code this up?

I was thinking of looping through the day variables and creating a difference variable, e.g. diff1.2 = day2-day1 and then counting how many ones (for entries) and negative ones (for exits) there are. This seems overly complicated.

Does anyone have a more elegant solution?


Solution

  • You could try this using one row of your example:

    Data <- c(1,0,1,1,0,1,1)
    DataDiff <- diff(Data)
    

    Which produces: -1, 1, 0, -1, 1, 0 In a pair where zero precedes 1 (entry) you end up with -1, and vice versa:

    Entry <- sum(DataDiff == -1)
    Exit <- sum(DataDiff == 1)
    

    If you try this in dplyr you'll have to use the rowwise() operator. Something like:

    library(dplyr)
    df <- df %>% rowwise() %>% mutate(
    Entries = sum(diff(c(day1, day2, day3, day4, day5))==-1),
    Exits = sum(diff(c(day1, day2, day3, day4, day5))==1))