Search code examples
pythonml.net

Numpy, Pandas counterpart in .Net or .Netcore


In ML.Net what are the counterparts of Numpy/ Pandas python libraries?


Solution

  • Here are all the available .NET counterparts that I know of:

    Numpy

    there are a few Tensor type proposals in dotnet/corefx:

    There is also an implementation of NumPy made by the SciSharp org.

    Pandas

    On dotnet/corefx there is a DataFrame Discussion issue, which has spawned a dotnet/corefxlab project to implement a C# DataFrame library similar to Pandas.

    There are also other DataFrame implementations:

    ML.NET

    In ML.NET, IDataView is an interface that abstracts the underlying storage for tabular data, ex. a DataFrame. It doesn't have the full rich APIs like a Pandas DataFrame does, but instead it supports reading data from any underlying source - for example a text file, SQL table, in-memory objects, etc.

    There currently isn't a "data exploration" API in ML.NET v1.0, like you would have with a Pandas DataFrame. The current plan is for the corefxlab DataFrame class to implement IDataView, and then you can use DataFrame to do the data exploration, and feed it directly into ML.NET.

    UPDATE: For a "data exploration" API similar to Pandas, check out the Microsoft.Data.Analysis package, which is currently in preview. It implements IDataView and can be fed directly into ML.NET to train or make predictions.