Search code examples
raggregategroup-summaries

finding percentage frequency of outcomes over groups in R


I have a very large data frame, representing time series data from an agent-based model, that looks like this:

ABM Model Run Data

Each row in this dataset represents a single cycle of the model, which can run for an arbitrary length of time and terminate in one of three endings: "unity," "stability," or "instability."

I'm building a big graph that displays time series data faceted by dimensions and connections, and I want to separate the runs by ending, such that all the runs ending in a particular ending get a separate color in the graph. I want the thickness of each line to be the relative frequency with which each kind of ending occurred in that batch.

In order to do this, I need to add another column to this data, "count," that counts the number of times a particular ending occurs in a batch of runs grouped by dimensions and connections, and then have that number appear in each row characterized by that ending.

So, let's say runs 1 through 10 are dimensions==4 and connections==2. Four of those runs end in "stability," two in "instability," and two in "unity." I'd like the "count" column to be 4, 2, and 2, for each row in that batch of data that got each respective ending.

This is a tough one. Thanks in advance!


Solution

  • Can't test without reproducible data, but using dplyr something like this should work:

    library(dplyr)
    your_data %>%
      group_by(dimensions, connections) %>%
      mutate(runs_in_batch = n()) %>%
      group_by(dimensions, connections, ending) %>%
      mutate(count = n(),
             pct_in_batch_this_ending = count / runs_in_batch)