azureazure-data-explorerkql

Take vs. Limit vs. Top, Sort vs. Order By


Using traditional SQL for long time, used to order by and limit, top is MS own "limit" but more intuitive.

Here two Kusto queries share the same condition and order by (sorting, right?), only difference is return how many, 20 vs. 200. Result is surprising:

AzureDiagnostics | where Category contains "postgresql" | take 20  | order by TimeGenerated desc

enter image description here

AzureDiagnostics | where Category contains "postgresql" | take 200  | order by TimeGenerated desc

enter image description here

top is more consistence

AzureDiagnostics | where Category contains "postgresql" | top 2  by TimeGenerated desc 
AzureDiagnostics | where Category contains "postgresql" | top 20  by TimeGenerated desc 

Update, thanks to @Yoni L, here is the summary:

  1. Sequence of each | section matters. For instance, |take 10 | order by x will sort AFTER take which is random.
  2. take = limit in terms of usage
  3. order by = sort by
  4. top 10 by x [asc/desc] = order by x [asc/desc] | take 10 except top # by x uses much less memory and should be preferred.

Solution

  • Unlike top, for take and limit (which are aliases) there is no guarantee which records are returned, unless the source data is sorted.

    The following are equivalent - top 20 by x desc, order by x desc | take 20, however this has different semantics: take 20 | order by x desc - it only orders the (up to) 20 records that were taken