Search code examples
rneural-networkrow

Randomise across columns for half a dataset


I have a data set for MMA bouts.

The structure currently is

Fighter 1, Fighter 2, Winner
x             y          x 
x             y          x
x             y          x
x             y          x
x             y          x

My problem is that Fighter 1 = Winner so my model will be trained that fighter 1 always wins, which is a problem.

I need to be able to randomly swap Fighter 1 and Fighter 2 for half the data set in order to have the winner represented equally.

Ideally i would have this

Fighter 1, Fighter 2, Winner
x            y         x
y            x         x
x            y         y
y            x         x
x            y         y

is there a way to randomise across columns without messing up the order of the rows ??


Solution

  • I'm assuming your xs and ys are arbitrary and just placeholders. I'll further assume that you need the Winner column to stay the same, you just need that the winner not always be in the first column.

    Sample data:

    set.seed(42)
    x <- data.frame(
      F1 = sample(letters, size = 5),
      F2 = sample(LETTERS, size = 5),
      stringsAsFactors = FALSE
    )
    x$W <- x$F1
    x
    #   F1 F2 W
    # 1  x  N x
    # 2  z  S z
    # 3  g  D g
    # 4  t  P t
    # 5  o  W o
    

    Choose some rows to change, randomly:

    (ind <- sample(nrow(x), size = ceiling(nrow(x)/2)))
    # [1] 3 5 4
    

    This means that we expect rows 3-5 to change.

    Now the random changes:

    within(x, { tmp <- F1[ind]; F1[ind] = F2[ind]; F2[ind] = tmp; rm(tmp); })
    #   F1 F2 W
    # 1  x  N x
    # 2  z  S z
    # 3  D  g g
    # 4  P  t t
    # 5  W  o o
    

    Rows 1-2 still show the F1 as the Winner, and rows 3-5 show F2 as the Winner.