Search code examples
dataframejuliamissing-data

Remove columns with all values missing in dataframe Julia


I have the following dataframe called df:

df = DataFrame(i=1:5,
               x=[missing, missing, missing, missing, missing],
               y=[missing, missing, 1, 3, 6])
5×3 DataFrame
 Row │ i      x        y       
     │ Int64  Missing  Int64?  
─────┼─────────────────────────
   1 │     1  missing  missing 
   2 │     2  missing  missing 
   3 │     3  missing        1
   4 │     4  missing        3
   5 │     5  missing        6

I would like to remove the columns where all values are missing. In this case it should remove column x because it has only all missing values. with dropmissing it removes all rows, but that's not what I want. So I was wondering if anyone knows how to remove only columns where all values are missing in a dataframe Julia?


Solution

    1. A mediocre answer would be:
    df1 = DataFrame()
    foreach(
      x->all(ismissing, df[!, x]) ? nothing : df1[!, x] = df[!, x],
      propertynames(df)
    )
    
    df
    # 5×2 DataFrame
    #  Row │ i      y
    #      │ Int64  Int64?
    # ─────┼────────────────
    #    1 │     1  missing
    #    2 │     2  missing
    #    3 │     3        1
    #    4 │     4        3
    #    5 │     5        6
    
    1. But a slightly better one would be using the slicing approach:
    df[:, map(x->!all(ismissing, df[!, x]), propertynames(df))]
    # 5×2 DataFrame
    #  Row │ i      y
    #      │ Int64  Int64?
    # ─────┼────────────────
    #    1 │     1  missing
    #    2 │     2  missing
    #    3 │     3        1
    #    4 │     4        3
    #    5 │     5        6
    
    # OR
    df[!, map(x->!all(ismissing, x), eachcol(df))]
    # 5×2 DataFrame
    #  Row │ i      y
    #      │ Int64  Int64?
    # ─────┼────────────────
    #    1 │     1  missing
    #    2 │     2  missing
    #    3 │     3        1
    #    4 │     4        3
    #    5 │     5        6
    
    #Or
    df[!, Not(names(df, all.(ismissing, eachcol(df))))]
    # I omitted the result to prevent this answer from becoming extensively lengthy.
    
    #Or
    df[!, Not(all.(ismissing, eachcol(df)))]
    
    1. I almost forgot the deleteat! function:
    deleteat!(permutedims(df), all.(ismissing, eachcol(df))) |> permutedims
    
    # 5×2 DataFrame
    #  Row │ i      y
    #      │ Int64  Int64?
    # ─────┼────────────────
    #    1 │     1  missing
    #    2 │     2  missing
    #    3 │     3        1
    #    4 │     4        3
    #    5 │     5        6
    
    1. You can use the select! function, as Dan noted:
    select!(df, [k for (k,v) in pairs(eachcol(df)) if !all(ismissing, v)])
    # 5×2 DataFrame
    #  Row │ i      y
    #      │ Int64  Int64?
    # ─────┼────────────────
    #    1 │     1  missing
    #    2 │     2  missing
    #    3 │     3        1
    #    4 │     4        3
    #    5 │     5        6