Given the following data frame from DataFrames.jl:
julia> using DataFrames
julia> df = DataFrame(x1=[1, 2, 3], x2=Union{Int,Missing}[1, 2, 3], x3=[1, 2, missing])
3×3 DataFrame
Row │ x1 x2 x3
│ Int64 Int64? Int64?
─────┼────────────────────────
1 │ 1 1 1
2 │ 2 2 2
3 │ 3 3 missing
I would like to find columns that contain missing
value in them.
I have tried:
julia> names(df, Missing)
String[]
but this is incorrect as the names
function, when passed a type, looks for subtypes of the passed type.
If you want to find columns that actually contain missing
value use:
julia> names(df, any.(ismissing, eachcol(df)))
1-element Vector{String}:
"x3"
In this approach we iterate each column of the df
data frame and check if it contains at least one missing value.
If you want to find columns that potentially can contain missing value you need to check their element type:
julia> names(df, [eltype(col) >: Missing for col in eachcol(df)]) # using a comprehension
2-element Vector{String}:
"x2"
"x3"
julia> names(df, .>:(eltype.(eachcol(df)), Missing)) # using broadcasting
2-element Vector{String}:
"x2"
"x3"