Search code examples
rloopscategorical-datalevels

How can I find the levels of a factor that occurs before another specified level?


I have data on the arrival times of species to food. I want to be able to determine the levels of breed that occur before the breed_jackals and breed_hyena levels for each carcass by using the got.here value which is their arrival time.

I only want the order so in the first case for carcass_336 I'd get one value for the jackals which would be the breed_eagles.

For the second carcass carcass_338 I'd have 2 levels for the hyena breed_lappets and breed_eagles in that order. And 3 levels for the jackal because the hyena arrives before it i.e. breed_lappets, breed_eagles & breed_hyena.

I thought arrivals$breed[arrivals$mycarcass=="carcass_336"] would work, but it gives me all the levels.

Ideally I'd also like to pick out which level occurs directly before the jackals and hyenas too by using the minimum got.here for each. E.g. for carcass_338 it would be the breed_eagles for breed_hyenas. Again the got.here value will be useful I think because I've used that to extract the shortest arrival times for each carcass for another purpose with:

arrivals[ arrivals$got.here == ave(arrivals$got.here, arrivals$mycarcass, FUN=min), ]

Here's my data:

arrivals <-  read.table(header=T, text="
who     breed           got.here   mycarcass
167     breed_eagles    102        carcass_336
183     breed_eagles    108        carcass_336
181     breed_eagles    271        carcass_336
134     breed_eagles    284        carcass_336
191     breed_eagles    311        carcass_336
283     breed_jackals   5419       carcass_336
118     breed_lappets   200        carcass_338
198     breed_eagles    219        carcass_338
151     breed_eagles    256        carcass_338
206     breed_hyenas    1759       carcass_338
294     breed_jackals   7948       carcass_338
235     breed_hyenas    10988      carcass_338
215     breed_hyenas    13629      carcass_338
290     breed_jackals   17013      carcass_338")

The expected output I'd like would be derived from this and would be the frequencies of these occurrences. e.g. for jackals

 preceeding_breed   frequency
 breed_eagles         1
 breed_lappets        0
 breed_hyenas         1

Solution

  • Here is one way to get count the arrivals by species prior to jackal arrival. There is probably a cleaner method. For clarity, I'm only going to show the solution for jackals, but getting the results for hyenas would be straightforward.

    # for each carcass, calculate the first jackal arrival
    first_jackals <- aggregate(got.here~mycarcass,
                               data=arrivals[arrivals$breed=="breed_jackals",], FUN=min)
    
    # tabulate the number of other animals arriving before the jackal
    beat_jackals <- sapply(unique(arrivals$mycarcass), function(i) {
            table(arrivals$breed[arrivals$mycarcass==i & 
                  arrivals$got.here < first_jackals$got.here[first_jackals$mycarcass==i]])})
    

    This returns a matrix with the counts for each breed, including hyenas and Jackals. Now, we drop the hyenas and jackals from the count and add carcass names to the columns:

    # drop unwanted breeds
    beat_jackals <- 
              beat_jackals[row.names(beatJackals) != "breed_jackals",]
    # add carcass names to the columns
    colnames(beat_jackals) <- unique(arrivals$mycarcass)
    

    because sapply processed the carcasses in the same order, unique(arrivals$mycarcass), we don't have to worry about misalignment.

    To get the order of arrival by breed to each carcass, you can use the following:

    arrival_order <- sapply(unique(arrivals$mycarcass), function(i) {
                                         unique(arrivals[arrivals$mycarcass==i, "breed"])})
    

    This will allow you to pull out the breed that arrived immediately prior to the jackal:

    sapply(arrival_order, function(i) i[(which(i=="breed_jackals"))-1])