I am using Deedle from c# and windowing through a frame is very slow compared with the same operation on a series. For example, for a series and frame with a similar size I am seeing 60ms vs 3500ms (series vs. frame).
Has anyone seen this before ?
var msftRaw = Frame.ReadCsv(@"C:\Users\olivi\source\repos\ConsoleApp\MSFT.csv");
var msft = msftRaw.IndexRows<DateTime>("Date").SortRowsByKey();
var rollingFrame = msft.Window(60); // 7700 ms
var openSeries = msft.GetColumn<double>("Open");
var rollingSeries = openSeries.Window(60); // 14 ms
var oneSeriesFrame = Frame.FromColumns(new Dictionary<string, Series<DateTime, double>> { { "Open", openSeries } });
var rollingFakeFrame = oneSeriesFrame.Window(60); // 3300mm
This is quite a common operation when working with financial time series data, for example calculating rolling correlation between prices, or calculating rolling realized volatility when there is a condition on another price time series.
I found a workaround for the performance issue: perform the rolling operation on each of the series individually, join the rolling series in a frame so they are aligned by date and write the processing function on the frame, selecting each series inside the processing function.
Continuing from the example above:
private static double CalculateRealizedCorrelation(ObjectSeries<string> objectSeries)
{
var openSeries = objectSeries.GetAs<Series<DateTime, double>>("Open");
var closeSeries = objectSeries.GetAs<Series<DateTime, double>>("Close");
return MathNet.Numerics.Statistics.Correlation.Pearson(openSeries.Values, closeSeries.Values);
}
var rollingAgg = new Dictionary<string, Series<DateTime, Series<DateTime, double>>>();
foreach (var column in msft.ColumnKeys)
{
rollingAgg[column] = msft.GetColumn<double>(column);
}
var rollingDf = Frame.FromColumns(rollingAgg);
var rolingCorr = rollingDf.Rows.Select(kvp => CalculateRealizedCorrelation(kvp.Value));