Search code examples
rfunctionprocessworkfloworganization

R - Better organize program workflow and processes using functions


I am working on a data mining project which is currently composed of several script.

What I would like to do is to better organize the processes executed in each script by using functions.

One of the many example can be the following:
in the script dedicated to clustering I apply a filter for outliers

library(dplyr)

myDF <-
        myDF %>%
        filter(distance > 680) %>%
        filter(time > 120) %>% 
        filter(speed > 5)

What I am looking for is the possibility of "wrapping" this process inside a small "node". In my mind, the thing closest to an Enterprise Miner node in R is a function. Therefore:

outlier_filter <- function() {   
            myDF %>%
            filter(distance > 680) %>%
            filter(time > 120) %>% 
            filter(speed > 5)
}

However, when I run:

outlier_filter 

It simply prints the code on the console. Instead I would like it to apply act like a node to filter the outliers from the DF.

I am open to other suggestions, however, the main point is that by simply executing a work, I want it to apply its effect to the data frame I am working on.
Another example could be the "node"/function create_features which when run, executes the code to add the new variables to my dataframe.

Hope I was clear, Thank you.


Solution

  • You are just printing the function, you probably want to call it using outlier_filter()

    However, this could only work by using side effects. Usually this is not something you want, a solution could be:

    outlier_filter <- function(df) {   
      df %>%
        filter(distance > 680) %>%
        filter(time > 120) %>% 
        filter(speed > 5) %>%
        return()
    }
    

    You can call the function using myDF <- outlier_function(myDF)