How do I swap N values in one data frame column for any value of N and any n(row)?

I have data frames which contain an Age difference column (AgeDiff). The data frame looks like:

library("dplyr")

test <- data.frame("Age1"=c(42, 48, 58, 25, 53, 55, 32, 58, 71, 24, 48, 48, 64, 55, 45, 55, 34, 33, 51, 22), 
                   "Age2"=c(8, 2, 1, 16, 14, 1, 11, 14, 0, 5, 2, 10, 16, 13, 3, 4, 8, 13, 8, 5)) 
test <- test %>%
mutate(AgeDiff = Age1 - Age2)

For the function I am writing, the user can enter a rejection minimum difference and/or a rejection maximum difference. Any age differences smaller than/larger than either threshold create an "out of scope" age difference.

In the work I am doing, any age difference less than 18 years or older than 54 years is "out of scope". By coincidence (I used a random number generator for both sets of ages), there are exactly two age differences that are too young, and two age differences that are too old.

I can find the maximum of "too young" or "too old", for example by comparing

TooYoung <- test %>%
filter(AgeDiff < 18) %>%
summarise(Count = n()) %>%
pull(Count)

with

TooOld <- test %>%
filter(AgeDiff > 54) %>%
summarise(Count = n()) %>%
pull(Count)

and then just working out which value is larger. The larger of TooYoung and TooOld gives me the number of rows I need to swap in the test data.

I can (pre)sort the test data frame so that AgeDiff is ascending in value:

test <- test %>%
arrange(AgeDiff)

That gives me the order I want, where the data frame is sorted from most extreme AgeDiff minimum to most extreme AgeDiff maximum. Now what I want to do is to swap the top-most and bottom-most Age1 values for the maximum of TooYoung and TooOld. Because I have 2 as my maximum "out of scope" minima/maxima counts in this exanple, I need to swap:

Age1 in row 1 with Age1 in row 20
Age1 in row 2 with Age1 in row 19

It does not matter if the swap results in an "out of scope" AgeDiff.

The data frame for the swap can be any length. The number of swaps to be made can be any number, including 0. So the problem becomes, for any swap value >0,

Age1 in row 1 with Age1 in nrow(foo)
Age1 in row 2 with Age1 in nrow(foo)-1
and so forth for the number of swaps to be made.

AgeDiff is going to be recalculated after the swaps are made. There are other variables in my data frame, such as Sex, so it is critical that only the Age1 values are swapped.

Reordering of the rows is unimportant. The only requirement is for the solution to swap the correct pairs of Age1 values.

I've searched for similar questions, but the ones I have found have been quite different. The other questions are a two-row swap for an initial percentage of the data frame, a swap of two known values for each other, swapping entire rows, swap of two randomly selected rows, value swaps based on grouping variables. In my problem, the number of swaps will be calculated with certainty, but the number varies between populations, the Age1 values to swap will differ, the number of Age1 values to swap must be exactly the maximum "out of scope" count, and there are no grouping variables.

Edited to add: assuming you have my data and have done the arrange, you will see that row 1 looks like:

Age1    Age2    AgeDiff
25      16      9

and row 20 looks like:

Age1    Age2    AgeDiff
71      0       71

Post swap these two rows would be: row 1:

Age1    Age2    AgeDiff
71      16      9

row 20:

Age1    Age2    AgeDiff
25      0       71

so just the two Age1 values are swapped.

Then the row 2 and row 19 swap, to end up with

row 2

Age1    Age2    AgeDiff
58      5       17

and for row 19

Age1    Age2    AgeDiff
22      1       57

The AgeDiff column is ignored as it is recalculated after the finishing the swap(s).

(I also missed that the initial data frame should also have been called test, I have now fixed that.)

Solution

im sure theres a much neater way to do this but....

library("dplyr")
test <- data.frame("Age1"=c(42, 48, 58, 25, 53, 55, 32, 58, 71, 24, 48, 48, 64, 55, 45, 55, 34, 33, 51, 22), 
                   "Age2"=c(8, 2, 1, 16, 14, 1, 11, 14, 0, 5, 2, 10, 16, 13, 3, 4, 8, 13, 8, 5)) 
test <- test %>%
  mutate(AgeDiff = Age1 - Age2) %>% 
  arrange(AgeDiff) %>% 
  dplyr::mutate(row_no = row_number())
test

swap <- function(df) {
  TooYoung <- df %>%
    filter(AgeDiff < 18) %>%
    summarise(Count = n()) %>%
    pull(Count)
  
  TooOld <- df %>%
    filter(AgeDiff > 54) %>%
    summarise(Count = n()) %>%
    pull(Count)
  
  top_bottom <- max(TooYoung, TooOld)
  
  df2 <- df %>% 
    filter(row_number() > max(row_number()) - top_bottom | row_number() <= top_bottom) %>% 
    mutate(final_age1 = Age1) %>% 
    dplyr::select(final_age1, row_no)
  df2$row_no <- sort(df2$row_no, decreasing = T)
  
  df_final <- df %>% 
    left_join(df2) %>% 
    mutate(final_age1 = ifelse(is.na(final_age1), Age1, final_age1)) %>% 
    dplyr::select(-Age1, -row_no)
             
  df_final
}
swap(test)

which i think gives you what you want?

# Joining, by = "row_no"
#    Age2 AgeDiff final_age1
# 1    16       9         71
# 2     5      17         58
# 3     5      19         24
# 4    13      20         33
# 5    11      21         32
# 6     8      26         34
# 7     8      34         42
# 8    10      38         48
# 9    14      39         53
# 10   13      42         55
# 11    3      42         45
# 12    8      43         51
# 13   14      44         58
# 14    2      46         48
# 15    2      46         48
# 16   16      48         64
# 17    4      51         55
# 18    1      54         55
# 19    1      57         22
# 20    0      71         25