How to extract one random daytime per day for every individual with a 50/50 ratio of night and day times in R?

I don't have much experience with R and I am stuck with the following problem: I have data in csv format from radio collared wildcats with datetime stamps and GPS locations (and some additional info as sex,age etc.). I have to balance the datasets for each individual as the frequency of the fixes (locations) is unequal. I want to extract one random position a day for every day of my data, for every cat. In addition, the chosen random points per individual should have a 50/50 ratio of day and night points. For this purpose I created a column defining if the location was recorded during night or day, but I don't know how to add the ratio rule to my code. I also wonder if its possible to save the random chosen points in the code, so if someone else runs it again, they get the same random points as I extracted the first time (I think it can be done with set.seed?). I often just don't know how to combine all the functions I wanna use.

I think I successfully extracted one random point a day per individual with the following code:

data %>% group_by(animals_id,utc) %>%
  sample_n(1) -> result

But how can I include the 50/50 ratio of day and night points per Individual and how can I add a set.seed function?

This is the structure of my data set:

  X animals_id    acquisition_time longitude latitude
1 1          1 2010-05-01 02:59:00  7.604915 47.94362
2 2          1 2010-05-01 10:00:00  7.604967 47.94373
3 3          1 2010-05-01 16:59:00  7.605800 47.94379
4 4          1 2010-05-02 06:59:00  7.604969 47.94358
5 5          1 2010-05-02 13:59:00  7.604921 47.94008
6 6          1 2010-05-03 03:59:00  7.605051 47.94356
       projection collar_type study_area_id animals_age_class
1 EPSG:4326-WGS48         gps            13                 a
2 EPSG:4326-WGS48         gps            13                 a
3 EPSG:4326-WGS48         gps            13                 a
4 EPSG:4326-WGS48         gps            13                 a
5 EPSG:4326-WGS48         gps            13                 a
6 EPSG:4326-WGS48         gps            13                 a
  animals_sex        utc day_night
1           f 2010-05-01     night
2           f 2010-05-01       day
3           f 2010-05-01       day
4           f 2010-05-02       day
5           f 2010-05-02       day
6           f 2010-05-03     night
>

I am very grateful for every tip.

Solution

I tried to define a function that randomly sample rows in a data frame based on a column, of which the elements can be grouped. This function split the data frame based on the groups of the column, and then randomly select an equal number of rows in each group, and then bind the samples back to a data frame. The n.each parameter specifies the number of sampled rows of each group . If not specified, n.each = 1 is used. Reproduction of the randomly sampled set of rows is possible by specifying seed parameter. If not specified, seed = 1 is used. The colname is column name (should be quoted). Note that rownames are removed from the resulted samples.

Here is the function and the examples

library(data.table)
library(magrittr)

sample_equal <- function(df, colname, n.each = 1, seed = 1) {

  eqsamp <- function(df) {
    set.seed(seed)
    df %>%
      transpose %>%
      sample(n.each) %>%
      transpose
  }

  sampled <- df %>%
    split(df[colname]) %>%
    lapply(eqsamp) %>%
    do.call(rbind, .) 
  
  rownames(sampled) <- NULL
  colnames(sampled) <- colnames(df)
  return(sampled)
}

# Example1 : Applied to `iris`

iris %>% sample_equal('Species', 2, seed = 3)

#  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#1            5         3.6          1.4         0.2     setosa
#2          4.8         3.4          1.6         0.2     setosa
#3          6.5         2.8          4.6         1.5 versicolor
#4          5.9           3          4.2         1.5 versicolor
#5          6.5           3          5.8         2.2  virginica
#6          6.4         2.7          5.3         1.9  virginica

# Example2 : Multistage sampling to `mtcars` 

sample_equal(mtcars, 'cyl', 3, seed = 5) %>% 
sample_equal('gear', 2, seed = 3)

#   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#1 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#2 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#3 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#4 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#5 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#6 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2

To apply it to your data, if you want to select with one day and one night, this should work:

result <- data %>% group_by(animals_id,utc) %>% 
equal_sample('day_night', seed = 3)

The result will have one night and one day, but the the latitude and longitude in the samples may be exactly the same.

If you need unique location, you can group by latitude or longitude, and then sample day/night. You also need to specify the number of day/nights sampled in each latutide or longitude. For example:

result2 <- data %>% group_by(animals_id,utc, latitude) %>% 
equal_sample('day_night', seed = 3)