Standard way to use a udf in polars

def udf(row):
    print(row)
    print(row[0])
    print(row[1])
    return row

df = pl.DataFrame({'a': [1], 'b': [2]})
df = df.map_rows(udf)

gives output,

(1, 2)
1
2

but I would like to use the [] notation, is there a specific reason that it comes as a tuple by default as when I use,

def udf(row):
    print(row['a'])
    print(row['b'])
    return row
df = pl.DataFrame({'a': [1], 'b': [2]})
df = df.map_rows(udf)

I get

TypeError: tuple indices must be integers or slices, not str

how do I make the [] notation work for custom udfs?

Solution

For a start, you should always prefer to use native polars expressions vs custom python functions. But if you absolutely know that you need it, then here it is.

From documentation of map_rows():

The frame-level map_rows cannot track column names (as the UDF is a black-box that may arbitrarily drop, rearrange, transform, or add new columns); if you want to apply a UDF such that column names are preserved, you should use the expression-level map_elements syntax instead.
map_elements() to apply function.

Solution 1

def udf(row):
    print(row['a'])
    print(row['b'])
    return row

df = pl.DataFrame({'a': [1], 'b': [2]})

df.select(pl.struct(pl.all()).map_elements(udf))

Output:
1
2

Solution 2

You can also adjust your function to so you can convert column names to indices:

def udf(row, cols):
    print(row)
    print(row[cols['a']])
    print(row[cols['b']])
    return row

df = pl.DataFrame({'a': [1], 'b': [2]})
cols = {v: i for i,v in enumerate(df.columns)}

df = df.map_rows(lambda x: udf(x, cols))

Solution 3

You can use rows() method with named = True.
Or, as @Henry Harbeck mentioned in comments, use iter_rows() so the rows are not materialized at once

def udf(row):
    print(row['a'])
    print(row['b'])
    return row
df = pl.DataFrame({'a': [1], 'b': [2]})

df = pl.DataFrame(udf(r) for r in df.iter_rows(named=True))