I have a data frame df
of car IDs
and whether this ID
entered and exited a parking garage in a given city in a given year:
ID City day1 day2 day3 day4 day5 day6 day7
1 NYC 1 0 1 1 0 1 1
2 BOS 0 1 1 0 0 1 0
3 SFO 1 1 0 0 0 0 0
4 SFO 1 1 1 1 1 1 1
4 NYC 1 0 0 1 0 1 1
What I want is a measure of the number of times a car entered and exited a given garage in a city across all days.
For id == 1
this would mean: day1->day2 gives 1 exit, day2->day3 gives 1 entry, day3->day4 gives no entries or exits, day4->day5 gives 1 exit, day5->day6 gives one 1 entry, day6->day7 no entries or exits. In total that means 2 entries and 2. I would like these two numbers in my table:
ID City day1 day2 day3 day4 day5 day6 day7 entries exits
1 NYC 1 0 1 1 0 1 1 2 2
2 BOS 0 1 1 0 0 1 0 2 2
3 SFO 1 1 0 0 0 0 0 0 1
4 SFO 1 1 1 1 1 1 1 0 0
4 NYC 1 0 0 1 0 1 1 2 2
In other words, entries = count(0->1)
, exits = count(1->0)
Does anyone know how I can code this up?
I was thinking of looping through the day variables and creating a difference variable, e.g. diff1.2 = day2-day1
and then counting how many ones (for entries) and negative ones (for exits) there are. This seems overly complicated.
Does anyone have a more elegant solution?
You could try this using one row of your example:
Data <- c(1,0,1,1,0,1,1)
DataDiff <- diff(Data)
Which produces: -1, 1, 0, -1, 1, 0 In a pair where zero precedes 1 (entry) you end up with -1, and vice versa:
Entry <- sum(DataDiff == -1)
Exit <- sum(DataDiff == 1)
If you try this in dplyr you'll have to use the rowwise() operator. Something like:
library(dplyr)
df <- df %>% rowwise() %>% mutate(
Entries = sum(diff(c(day1, day2, day3, day4, day5))==-1),
Exits = sum(diff(c(day1, day2, day3, day4, day5))==1))