Search code examples
dataframerustpython-polarsmutabilityrust-polars

Efficient way to update a single element of a Polars DataFrame?


Polars DataFrame does not provide a method to update the value of a single cell currently. Instead, we have to the method DataFrame.apply or DataFrame.apply_at_idx that updates a whole column / Series. This can be very expensive in situations where an algorithm repeated update a few elements of some columns. Why is DataFrame designed in this way? Looking into the code, it seems to me that Series does provide inner mutability via the method Series._get_inner_mut?


Solution

  • As of polars >= 0.15.9 mutation of any data backed by number is constant complexity O(1) if data is not shared. That is numeric data and dates and duration.

    If the data is shared we first must copy it, so that we become the solely owner.

    import polars as pl
    import matplotlib.pyplot as plt
    from time import time
    
    ts = []
    ts_shared = []
    clone_times = []
    ns = []
    
    for n in [1e3, 1e5, 1e6, 1e7, 1e8]:
        s = pl.zeros(int(n))
        
        t0 = time()
        # we are the only owner
        # so mutation is inplace
        s[10] = 10
        
        # time
        t = time() - t0
        
        # store datapoints
        ts.append(t)
        ns.append(n)
        
        # clone is free
        t0 = time()
        s2 = s.clone()
        t = time() - t0
        clone_times.append(t)
        
        
        # now there are two owners of the memory
        # we write to it so we must copy all the data first
        t0 = time()
        s2[11] = 11
        t = time() - t0
        ts_shared.append(t)
        
    
    
    plt.plot(ns, ts_shared, label="writing to shared memory")
    plt.plot(ns, ts, label="writing to owned memory")
    plt.plot(ns, clone_times, label="clone time")
    plt.legend()
    
    

    enter image description here

    In rust this dispatches to set_at_idx2, but it is not released yet. Note that using the lazy engine this will all be done implicitly for you.