Search code examples
rdataframefunctionrow

How to run a function for each row in a dataframe in R?


I have a data frame with the high and low values for each day in different periods of time (A, B, C, D). I want to get the most frequent value in that range of values. This a reproducible example of my dataframe:

library(modeest)
library(tidyverse)

st <- as.Date("2021-05-19")
en <- as.Date("2021-07-16")

day_month = seq(st, en, by = "1 days")

low_A = seq(1.25, 15.75, by = 0.25)
high_A = seq(2.25, 16.75, by = 0.25)
low_B = seq(0.25, 14.75, by = 0.25)
high_B = seq(0.50, 15, by = 0.25)
low_C = seq(1.25, 15.75, by = 0.25)
high_C = seq(2.25, 16.75, by = 0.25)
low_D = seq(0.75, 15.25, by = 0.25)
high_D = seq(2.25, 16.75, by = 0.25)

df <- data.frame(day_month, high_A, low_A, high_B, low_B, high_C, low_C, high_D, low_D)

Given that I have a range of values for each day the price touch each value between the high and the low of that day. So for a given day, let's say the last day of my data frame which is 2021-07-16 the highest value was 16.75 while the lowest value of that day was 14.75.

tail(df)
    day_month high_A low_A high_B low_B high_C low_C high_D low_D
54 2021-07-11  15.50 14.50  13.75 13.50  15.50 14.50  15.50 14.00
55 2021-07-12  15.75 14.75  14.00 13.75  15.75 14.75  15.75 14.25
56 2021-07-13  16.00 15.00  14.25 14.00  16.00 15.00  16.00 14.50
57 2021-07-14  16.25 15.25  14.50 14.25  16.25 15.25  16.25 14.75
58 2021-07-15  16.50 15.50  14.75 14.50  16.50 15.50  16.50 15.00
59 2021-07-16  16.75 15.75  15.00 14.75  16.75 15.75  16.75 15.25

So the price was between that range of value. If I use an interval of 0.25 points the price was running from 16.75, 16.50, 16.25, 16.00 ...15.00, 14.75. Each time has a range of values, for time A the range on the last day of my data frame was 15.50, 15.25, 15.00, 14.75, 14.50. for time B the range of the last day was 13.75, 13.50, and so on. What I want is to find the mode (or the most frequent value) in that day using those range of values. So I create this function:

poc_2 <- function(df){
# create the sequence of each period
  x_A <- with(df, seq(low_A, high_A, by = 0.25))
  x_B <- with(df, seq(low_B, high_B, by = 0.25))
  x_C <- with(df, seq(low_C, high_C, by = 0.25))
  x_D <- with(df, seq(low_D, high_D, by = 0.25))

# the range has different lenght so I use this to make all range of the same length with  NA values  
  n <- max(length(x_A), length(x_B), length(x_C), length(x_D))
  
  length(x_A) <- n
  length(x_B) <- n
  length(x_C) <- n
  length(x_D) <- n
  
  pf <- cbind(x_A, x_B,x_C, x_D)
  
  xfd <- data.frame(pf)

# I change the format of my data frame so I can calculate the mode of all values

  long <- xfd %>% gather(x, value, x_A:x_D)

# delete the NA values that are given by the change of length

  long <- na.omit(long)

# get the mode of the last value

  return(last(mfv(long$value)))
}

This code works and returns the expected value for just one row by using:

poc_2(df[59,])
[1] 16.75

Which gives the highest mode of that range on the last day. I want to calculate this for each row of my data frame. I tried several options that I found.

df %>% rowwise() %>% mutate(poc = poc_2())

# gives an error:
Error: Problem with `mutate()` input `poc`.
x argument "df" is missing, with no default
i Input `poc` is `poc_2()`.
i The error occurred in row 1.

I also tried:

apply(df, 1, poc_2)

Error in eval(substitute(expr), data, enclos = parent.frame()) : 
  invalid 'envir' argument of type 'character'

My question is: Is there any way I can get this function for all rows of my data frame so I can store a new variable with the mode of that day?


Solution

  • If you want to use your poc_2 function you can do that by using one of the option below -

    library(tidyverse)
    
    #1. sapply split
    sapply(split(df, seq(nrow(df))),poc_2) 
    
    #2. by
    by(df, seq(nrow(df)),poc_2) 
    
    #3. tidyverse
    df %>% group_split(row_number()) %>%map_dbl(poc_2)
    
    #4.
    df %>% rowwise() %>% mutate(poc = poc_2(cur_data()))
    

    However, why not use the Mode function from here and apply it for each day ?

    Mode <- function(x) {
      ux <- unique(x)
      ux[which.max(tabulate(match(x, ux)))]
    }
    
    df %>%
      pivot_longer(cols = -day_month) %>%
      group_by(day_month) %>%
      summarise(frequent_value = Mode(value)) 
    
    #   day_month  frequent_value
    #   <date>              <dbl>
    # 1 2021-05-19           2.25
    # 2 2021-05-20           2.5 
    # 3 2021-05-21           2.75
    # 4 2021-05-22           3   
    # 5 2021-05-23           3.25
    # 6 2021-05-24           3.5 
    # 7 2021-05-25           3.75
    # 8 2021-05-26           4   
    # 9 2021-05-27           4.25
    #10 2021-05-28           4.5 
    # … with 49 more rows
    

    Or using rowwise -

    df %>%
      rowwise() %>%
      mutate(frequent_value = Mode(c_across(-day_month))) %>%
      ungroup