What is the most efficient way to remove rows containing any null values, for a table with many rows?
Additionally, why do I get the error
ERROR: 'time (attempt to use variable time without defining/assigning first
even though I have nothing named as ‘time’ when using the functional select
?[t;{(not;(null;x))} each cols t; 0b; ()]
given in this S/Oanswer?
Is your table in memory or splayed on disk?
If your table is in memory, then a simple form might also be the fastest.
show t:flip`a`b`c`d`e!(1+til 100)*/:100 cut 500?(1 0N)where 19 1
a b c d e
--------------
1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
10 10 10 10 10
11 11 11 11 11
12 12 12 12 12
13 13 13 13
14 14 14 14 14
15 15 15 15 15
16 16 16 16
17 17 17 17
18 18 18 18 18
..
q)\ts:1000 ?[t;{(not;(null;x))}each cols t;0b;()]
5 7552
q)\ts:1000 t where not any null flip t
2 6768
If the table is splayed on disk, the game changes. In a splayed table each column is stored as a file. The second argument of the Select operator is a list of constraints; each one a parse tree. Whereas null flip t
above tests every cell of t
, Select applies is constraints successively: only rows that have passed previous constraints are tested for the next. If no rows satisfy the first constraint/s subsequent column files may never need to be read at all.
From this follows the rule: list your constraints in descending order of restrictiveness – if nulls are more common in certain columns, test them first.
Regarding an in-memory test, note that not any
can be a good deal faster than all not
, depending on whether nulls are common.
q)q:null flip t
q)\ts:10000 all not q
9 1168
q)\ts:10000 not any q
6 656