Search code examples
dataframejuliasubset

Julia subset function not skipping missing values?


I have this code snippet which works fine when my filtering function checks for missing values and throws them out, too:

function subset_simple(code)::Bool
if ismissing(code)
    return false
end 
code == 1 || code == 2 || code == 3
end

fe2=subset(fe,:value => ByRow(subset_simple),skipmissing=true);

If I remove the ismissing bit, then subset complains:

TypeError: non-boolean (Missing) used in boolean context

I find that very strange thinking that subset was wholly designed to skip over missing values. Why did I have to instruct my function to check for it, too?


Solution

  • Dan is right. However, to expand on it you have the following options which I would find elegant:

    subset(fe,:value => ByRow(in((1, 2, 3))), skipmissing=true)
    

    or

    subset(fe,:value => ByRow(in(Set((1, 2, 3)))))
    

    Note that the difference is because:

    julia> in((1,2,3))(missing)
    missing
    
    julia> in(Set((1,2,3)))(missing)
    false
    

    and for a large number of options Set would be in general faster.