Search code examples
nullimpala

Impala - replace NULL to zero for entire table


Hi experts out there -

Is there anyway to replace null to zero for an entire table in impala? So far, I've only found coalesce or case - when, which allows me to change column by column..but I have more than 210 + columns in the table so i'm looking for something more efficient if possible..

SELECT COALESCE(table1.column1,0) FROM table1

Thanks in advance!


Solution

  • This doesn't answer the general question, if one needs to replace NAs in 200 columns without typing COALESCE(table1.column1,0) 200+ times.

    But I can see from your profile you probably use R. So do I, and I have the same issue: I need to replace NAs in my table with 200 columns.

    My solution is to use implyr with R.

    # libraries
    library(pacman)
    p_load(implyr
           ,odbc
           ,tidyverse
    )
    
    # connect to database
    con <- src_impala(drv = odbc::odbc(),
                      HOST="host.xxx.x",
                      PORT=12345,
                      ...)
    
    # connect to table
    table1 <- tbl(con,table1)
    
    # get columns with I want to replace NAs in, in my case it's numeric tables
    numeric_columns_to_replace_NAs <- table1 %>% 
                                      select_if(is.numeric) %>% 
                                      colnames
    
    # the command which replaces all NAs with 0's
    query <- table1 %>% 
             mutate_at(.vars = vars(numeric_columns_to_replace_NAs),
                       .funs = funs(coalesce(.,0))) 
    
    # run command - this will print to the console
    query 
    

    you can also compute() or collect() your results depending on what you need (see docs)

    If you need the query in hive, you can extract the code this way:

    # some object with the class "tbl_impala"
    query_object <- query %>% 
                  collapse() %>% 
                  .[2]
    
    # actual query which can be passed to hive via R. 
    sql_query <- query_object[[1]]$x %>% 
               as.character()
    
    # create new table
    new_query <- paste0("CREATE TABLE table2 AS ",sql_query) %>% 
               str_replace_all(pattern = "'",replacement = '"') # this cleans up the code so it works
    
    # if you want text without characters like \n in there.
    cat(sql_query)