I am working on a data mining project which is currently composed of several script.
What I would like to do is to better organize the processes executed in each script by using functions.
One of the many example can be the following:
in the script dedicated to clustering I apply a filter for outliers
library(dplyr)
myDF <-
myDF %>%
filter(distance > 680) %>%
filter(time > 120) %>%
filter(speed > 5)
What I am looking for is the possibility of "wrapping" this process inside a small "node". In my mind, the thing closest to an Enterprise Miner node in R is a function. Therefore:
outlier_filter <- function() {
myDF %>%
filter(distance > 680) %>%
filter(time > 120) %>%
filter(speed > 5)
}
However, when I run:
outlier_filter
It simply prints the code on the console. Instead I would like it to apply act like a node to filter the outliers from the DF.
I am open to other suggestions, however, the main point is that by simply executing a work, I want it to apply its effect to the data frame I am working on.
Another example could be the "node"/function create_features
which when run, executes the code to add the new variables to my dataframe.
Hope I was clear, Thank you.
You are just printing the function, you probably want to call it using outlier_filter()
However, this could only work by using side effects. Usually this is not something you want, a solution could be:
outlier_filter <- function(df) {
df %>%
filter(distance > 680) %>%
filter(time > 120) %>%
filter(speed > 5) %>%
return()
}
You can call the function using myDF <- outlier_function(myDF)