Search code examples
rdataframe

How to create a equally spaced number within a fixed range with a group in R


I have the following dataset:

df <- tribble(
~opp_id, ~month, ~count,
"304956938","Oct 2023","2",
"304956938","Oct 2023","2",
"305075384","Nov 2023","1",
"304910225","Dec 2023","2",
"305101457","Dec 2023","2",
"305005905","Feb 2024","1",
"305089124","Mar 2024","3",
"304955132","Mar 2024","3",
"304955132","Mar 2024","3",
"305005359","Jun 2024","2",
"304904187","Jun 2024","2",
"304973572","Aug 2024","1",
"304984865","Sep 2024","1",
)

I wished to create a equally spaced number between 1 and 100, for each "month"

Basically, there are different number of opportunities in different months. For example, there are 2 opportunities in "Oct 2023", while there are 3 opportunities in "Mar 2024". Then for Oct 2023, I wish to cut 100 by 3 (# of opportunities + 1) and put two opportunities in 33.3 and 66.6 respectively. For Mar 2024, the position of the three opportunities would be 25, 50, 75 respectively (order doesn't matter). Ideal output would look like the following (for the two months I just mentioned)

~opp_id, ~month, ~count, ~position
"304956938","Oct 2023","2",33.3,
"304956938","Oct 2023","2",66.6,
"305089124","Mar 2024","3",25,
"304955132","Mar 2024","3",50,
"304955132","Mar 2024","3",75

I can generate random number within each group (month), but I can't think of a way to achieve the desired result I mentioned above.


Solution

  • With dplyr, you can use row_number() / (n() + 1):

    library(dplyr)
    
    df %>%
      mutate(position = row_number() / (n() + 1) * 100, .by = month)
    
    # # A tibble: 13 × 4
    #    opp_id    month    count position
    #    <chr>     <chr>    <chr>    <dbl>
    #  1 304956938 Oct 2023 2         33.3
    #  2 304956938 Oct 2023 2         66.7
    #  3 305075384 Nov 2023 1         50  
    #  4 304910225 Dec 2023 2         33.3
    #  5 305101457 Dec 2023 2         66.7
    #  6 305005905 Feb 2024 1         50  
    #  7 305089124 Mar 2024 3         25  
    #  8 304955132 Mar 2024 3         50  
    #  9 304955132 Mar 2024 3         75  
    # 10 305005359 Jun 2024 2         33.3
    # 11 304904187 Jun 2024 2         66.7
    # 12 304973572 Aug 2024 1         50  
    # 13 304984865 Sep 2024 1         50
    

    Its base equivalent:

    transform(df,
      position = ave(seq_len(nrow(df)), month,
                     FUN = \(x) seq_along(x) / (length(x) + 1) * 100)
    )