Polars Rust melt() significantly slower than R stack()

I have some R code that takes a wide data.frame and stacks it into a narrow one. I rewrote this in Rust, but am finding it to be painfully slow. I am wondering if I am using bad practice or something here that is killing speed.

Original R version:

df = cbind(df[ncol(df)], df[ncol(df)-3], df[ncol(df)-2], df[ncol(df)-1], stack(df[1:(ncol(df)-4)]))

The stack(df[1:(ncol(df)-4)]) part takes all but the last 4 columns (usually 1,000) and stacks them. It also creates a second column which indicates which column a row came from. Then I cbind the other 4 columns back to it. R automatically repeats them to match the new length of the narrow df.

Here is my Polars eager version:

let n = 1000;
let sample_cols = (0..n).collect::<Vec<i32>>()
    .par_iter()
    .map(|l| format!("{}", l))
    .collect::<Vec<String>>();

let mut df = df.melt(&["A", "B", "C", "D"], sample_cols).unwrap();

sample_cols is a Vec containing the column names to be stacked, which are strings from 0 to 999, for the 1000 samples.

Here is the lazy version:

let n = 1000;
let sample_cols = (0..n).collect::<Vec<i32>>()
    .par_iter()
    .map(|l| format!("{}", l))
    .collect::<Vec<String>>();

let melt_args = MeltArgs {
    id_vars: vec!["A".into(), "B".into(), "C".into(), "D".into()],
    value_vars: sample_cols,
    variable_name: None,
    value_name: None,
};

let mut df = df.lazy().melt(melt_args).collect()?;

Both Rust versions are similar speed, but much slower than R. With n = 100,000 the R code takes 0.45s on average, but sometimes as little as .23s, while both Rust versions take 13.5s to 14.5s.

If you would like to run it yourself this should generate dummy data and run it,just make sure to use only the eager or lazy version at a time:

use rand_distr::{Normal, Distribution};
use rayon::prelude::*;
use ndarray::Array2;
#[macro_use]
extern crate fstrings;
use polars::prelude::*;
use std::time::Instant;

fn multi_rnorm(n: usize, means: Vec<f64>, sds: Vec<f64>) -> Array2<f64> {

    let mut preds: Array2<f64> = Array2::zeros((means.len(), n));

    preds.axis_iter_mut(ndarray::Axis(0)).into_par_iter().enumerate().for_each(|(i, mut row)| {

        let mut rng = rand::thread_rng();
        (0..n).into_iter().for_each(|j| {
            let normal = Normal::new(means[i], sds[i]).unwrap();
            row[j as usize] = normal.sample(&mut rng);
        })
    });
    preds
}

let n = 100000;

let means: Vec<f64> = vec![0.0; 15];
let sds: Vec<f64> = vec![1.0; 15];
let preds = rprednorm(n as usize, means, sds);

let mut df: DataFrame = DataFrame::new(
    preds.axis_iter(ndarray::Axis(1))
        .into_par_iter()
        .enumerate()
        .map(|(i, col)| {
            Series::new(
                &f!("{i}"),
                col.to_vec()
            )
        })
        .collect::<Vec<Series>>()
    )?;

let start = Instant::now();
let sample_cols= (0..n).collect::<Vec<i32>>()
    .par_iter()
    .map(|l| format!("{}", l))
    .collect::<Vec<String>>();

df.with_column(Series::new("A", &["1", "2", "3", "1", "2", "3'", "1", "2", "3", "1", "2", "3", "1", "2", "3"]));
df.with_column(Series::new("B", &["1", "1", "1", "2", "2", "2", "3", "3", "3", "4", "4", "4", "5", "5", "5"]));
df.with_column(Series::new("C", &["1", "2", "3", "1", "2", "3'", "1", "2", "2", "1", "2", "3'", "1", "2", "3"]));
df.with_column(Series::new("D", (0..df.shape().0 as i32).collect::<Vec<i32>>()));

let melt_args = MeltArgs {
    id_vars: vec!["A".into(), "B".into(), "C".into(), "D".into()],
    value_vars: sample_cols,
    variable_name: None,
    value_name: None,
};

let start = Instant::now();
let mut df = df.lazy().melt(melt_args).collect()?;
let duration = start.elapsed();
println!("{:?}", duration);

let start = Instant::now();
let mut df = df.melt(&["A", "B", "C", "D"], &sample_cols).unwrap();
let duration = start.elapsed();
println!("{:?}", duration);

Solution

I submitted an issue on Github, and the existing implementation was improved from O(n^2) to O(n), it is now faster than R. It is not part of the latest update so you will need to install from github instead of crates.io