The example in a section about 'list context' in the polars-book uses pl.col("")
expression with an empty string ""
as the argument.
# the percentage rank expression
rank_pct = pl.col("").rank(descending=True) / pl.col("").count()
From the context and the output I can guess what pl.col("")
expression does. But the API documentation does not seem to cover a case of empty string as the argument to pl.col
and I would like to know the precise meaning in this use case. Any helpful answer is greatly appreciated!
The precise meaning is to act as a 'root' Expression to start a chain of Expressions inside a List context, i.e., inside list.eval(....)
. I'll need to take a step back to explain...
In general, only certain types of Expressions are allowed to start (or be the 'root' of) an Expression. These 'root' Expressions work with a particular context (select
, filter
,with_columns
, etc..) to identify what data is being addressed.
Some examples of root Expressions are polars.col
, polars.map_batches
, polars.map_groups
, polars.first
, polars.last
, polars.all_horizontal
, and polars.any_horizontal
. (There are others.)
Once we declare a "root" Expression, we can then chain other, more-generic Expressions to perform work. For example, polars.col("my_col").sum().over('other_col').alias('name')
.
A List context is slightly different from most contexts. In a List context, there is no ambiguity as to what data is being addressed. There is only a list of data. As such, polars.col
and polars.first
were chosen as "root" Expressions to use within a List context.
Normally, a polars.col
root Expression contains information such as a string to denote a column name or a wildcard expression to denote multiple columns. However, this is not needed in a List context. There is only one option - the single list itself.
As such, any string provided to polars.col
is ignored in a List context. For example, from the code from the Polars Guide, this code also works:
# Notice that I'm referring to columns that do not exist...
rank_pct = pl.col("foo").rank(descending=True) / pl.col("bar").count()
Since any string provided to a polars.col
Expression will be ignored in a List context, a single empty string ""
is often supplied, just to prevent unnecessary clutter.
polars.element
expressionPolars now has a polars.element
expression designed for use in list evaluation contexts. Using polars.element
is now considered idiomatic for list contexts, as it avoids confusion associated with using col(“”)
.