Search code examples
rustrust-polars

How do I use `ndarray_stats::CorrelationExt` on a `polars::prelude::DataFrame`?


I'm trying to calculate the covariance of a data frame in Rust. The ndarray_stats crate defines such a function for arrays, and I can produce an array from a DataFrame using to_ndarray. The compiler is happy if I use the example in the documentation (a), but if I try to use it on an Array2 produced from a DataFrame, this doesn't work:

use polars::prelude::*;
use ndarray_stats::CorrelationExt;

fn cov(df: &DataFrame) -> Vec<f64> {
    // Both of these are Array2<f64>s
    let mat = df.to_ndarray::<Float64Type>().unwrap();
    let a = arr2(&[[1., 3., 5.], [2., 4., 6.]]);

    let x = a.cov(1.).unwrap();
    let y = mat.cov(1.).unwrap();
}
   |
22 |     let y = mat.cov(1.).unwrap();
   |                 ^^^ method not found in `ndarray::ArrayBase<ndarray::data_repr::OwnedRepr<f64>, ndarray::dimension::dim::Dim<[usize; 2]>>`

Why does the compiler allow the definition of x but not y? How can I fix the code such that y can be assigned?


Solution

  • It is a dependency version mismatch.polars-core depends on ndarray version 0.13.x as of 0.14.7, whereas ndarray-stats 0.5 requires ndarray 0.15. As you use the latest version of ndarray in your project as well, the 2D array type of x will be compatible with the extension trait CovExt provided by ndarray-stats, but y will not.

    Regardless of the nature of a type in a library, once multiple semver-incompatible versions of a library are included, their types will typically not be interchangeable. In other words, even though these Array2<_> may appear to be the same type, they are treated as different types by the compiler.

    The multiple versions of a crate in a package can be found by inspecting the output of cargo tree -d, which shows only duplicate dependencies and the reverse tree that shows the crates depending on them. Duplicates do not necessarily pose a problem, but problems arise if the project consumes more than one API directly.

    The lowest common denominator at the time of writing is to downgrade ndarray to 0.13 and ndarray-stats to 0.3, which also has the method cov. It may also be worth looking into contributing to the polars project in order to update ndarray there.