Unable to create exactly equal data partitions using createDataPartition in R- getting 1396 and 1398 observations each but need 1397

I am quite familiar with R but never had this requirement where I need to create exactly equal data partition randomly using createDataPartition in R.

index = createDataPartition(final_ts$SAR,p=0.5, list = F)
final_test_data = final_ts[index,]
final_validation_data = final_ts[-index,]

This code creates two datasets with sizes 1396 and 1398 observations respectively.

I am surprised why p=0.5 doesn't do what it is supposed to do. Does it have something to do with resulting dataset not having odd number of observations by default? Thanks in advance!

Solution

It has to do with the number of cases of the response variable (final_ts$SAR in your case).

For example:

y <- rep(c(0,1), 10)
table(y)
y
0  1 
10 10 
# even number of cases

Now we split:

train <- y[caret::createDataPartition(y, p=0.5,list=F)]
table(train) # we have 10 obs 
train
0 1 
5 5 

test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
table(test) # we have 10 obs.
test
0 1 
5 5

If we build and example instead with odd number of cases:

y <- rep(c(0,1), 11)
table(y)
y
0  1 
11 11

We have:

train <- y[caret::createDataPartition(y, p=0.5,list=F)]
table(train) # we have 12 obs.
train
0 1 
6 6 

test <- y[-caret::createDataPartition(y, p=0.5,list=F)]
table(test) # we have 10 obs.
test
0 1 
5 5

More info here.