Search code examples
c#dataframedeedleohlc

Deedle Resample on Price column and return Time field along with that


Trying to use Deedle to parse csv file and produce Open High Low Close values based on lastPrice value

Bellow is the sample from csv that contains tick data (each line a is a tick interval):

"time", "lastPrice", "bidPrice", "askPrice", "volume" 
20230208 050001 0040000;4175.25;4175;4175.25;3
20230208 050001 1040000;4175.25;4175.25;4175.5;1
20230208 050001 1040000;4175.25;4175.25;4175.5;2
20230208 050001 3520000;4175.25;4175;4175.25;1
20230208 050007 1880000;4175.25;4175;4175.25;1
20230208 050007 8520000;4175.25;4175;4175.25;1
20230208 050008 8280000;4175.25;4175;4175.25;1
20230208 050010 2920000;4175;4175;4175.25;1
20230208 050010 2920000;4175;4175;4175.25;1
20230208 050010 2920000;4175;4175;4175.25;1
20230208 050010 2920000;4175;4175;4175.25;1
20230208 050010 2920000;4175;4175;4175.25;1
20230208 050010 3520000;4175;4175;4175.25;1

Following is the example as far as making use of ResampleEquivalence extension on lastPrice column and returning OHLC values based on 2000 ticks interval (all in all to resample 1 tick and produce 2000 ticks interval)

var df = Frame.ReadCsv("./Data/ES 03-23.Last.txt", hasHeaders: false, separators: ";").IndexColumnsWith(new[] { "time", "lastPrice", "bidPrice", "askPrice", "volume" });
            
            var prices = df["lastPrice"];
            var agg = prices.ResampleEquivalence(kv => {
                return kv / 2000;
            }
            ).Select(data =>
              new 
              {
                  Timestamp = data.ToString(),
                  Open = data.Value.FirstValue(),
                  High = data.Value.Max(),
                  Close = data.Value.LastValue(),
                  Low = data.Value.Min()
              });

            Frame.FromRecords(agg).Print();

It all seem to work as expected, but can't seem to figure out how to output Time along with OHLC which should be MAX(Time) for that 2000 tick interval?


Solution

  • You probably need to do a bit more work here. Right now, you are resampling based on an ordinal index (row number), so you get 2000 rows in each chunk, regardless of the timestamp.

    I suspect you do not want that - and you instead want to resample based on the time specified in the time column. To do this, you first need to turn time into an index of the frame. Deedle does not allow duplicate entries in the index (which seem to be present in your data), so the trick is to use Tuple<Time, int> as the index with the time and an index.

    I'm not sure how to best parse your date & time, so for simplicity, I just take the last bit after the last space and turn that into an int. You need to do something more clever, but it illustrates the resampling well enough:

    var df =
      Frame.ReadCsv("c:/temp/p.csv", hasHeaders: false, separators: ";")
        .IndexColumnsWith(new[] { "time", "lastPrice", "bidPrice", "askPrice", "volume" });
    
    df.AddColumn("Index", df.Rows.Select(kvp => {
        var time = Int32.Parse(kvp.Value.GetAs<string>("time").Split(' ')[2]);
        var idx = kvp.Key;
        return Tuple.Create(time, idx);
      }));
    
    var dfByTime = df.IndexRows<Tuple<int, int>>("Index", false).SortByRowKey();
    

    Now we have a data frame indexed by the newly created Index, which is a pair of time (represented as int) and an index (just a number to disambiguate multiple rows with the same time).

    You can now do the resampling based on the time divided by 2000 (in your sample input, each 2000 chunk will have just a single row if my parsing is correct):

    var prices = dfByTime["lastPrice"];
    var agg = 
      prices.ResampleEquivalence(kv => kv.Item1 / 2000)
        .SelectKeys(kvp => kvp.Key * 2000)
      .Select(data =>
      new
      {
        Timestamp = data.Key.ToString(),
        Open = data.Value.FirstValue(),
        High = data.Value.Max(),
        Close = data.Value.LastValue(),
        Low = data.Value.Min()
      });
    
    Frame.FromRecords(agg).Print();
    

    The new Key here is the time divided by 2000. I use SelectKeys to turn that back into original tick. This is almost what you need, except that I think it gives you the first tick in the range, rather than the last. I suspect you could just use kvp.Key*2000+2000 to get the last tick in the range.