Search code examples
sqlrmachine-learningone-hot-encoding

How to set flag value based on data that use one-hot-encoding


I have a database consisting of three tables like this:

database

I want to make a machine learning model in R using that database, and the data I need is like this:

Required data

I can use one hot encoding to convert categorical variable from t_pengolahan (such as "Pengupasan, Fermentasi, etc") into attributes. But, how to set flag (yes or no) to the data value based on "result (using SQL query)" data above?


Solution

  • We can combine two answers to previous related questions, each of which provides half of the solution; those answers are found here and here:

    library(dplyr) ## dplyr and tidyr loaded for wrangling
    library(tidyr)
    options(dplyr.width = Inf) ## we want to show all columns of result
    yes_fun <- function(x) { ## helps with pivot_wider() below
        if ( length(x) > 0 ) {
            return("yes")
        }
    }
    sql_result %>%
        separate_rows(pengolahan) %>% ## add rows for unique words in pengolahan
        pivot_wider(names_from = pengolahan, ## spread to yes/no indicators
                    values_from = pengolahan,
                    values_fill = list(pengolahan = "no"),
                    values_fn = list(pengolahan = yes_fun))
    

    Data

    id_pangan  <- 1:3
    kategori   <- c("Daging", "Buah", "Susu")
    pengolahan <- c("Penggilingan, Perebusan", "Pengupasan",
                    "Fermentasi, Sterilisasi")
    batas      <- c(100, 50, 200)
    sql_result <- data.frame(id_pangan, kategori, pengolahan, batas)
    
    # A tibble: 3 x 8
      id_pangan kategori batas Penggilingan Perebusan Pengupasan
          <int> <fct>    <dbl> <chr>        <chr>     <chr>     
    1         1 Daging     100 yes          yes       no        
    2         2 Buah        50 no           no        yes       
    3         3 Susu       200 no           no        no        
      Fermentasi Sterilisasi
      <chr>      <chr>      
    1 no         no         
    2 no         no         
    3 yes        yes