I have Python code that uses trade data as input. The trade data is taken from here: https://data.binance.vision/?prefix=data/spot/daily/trades/ETHUSDT/.
import pandas as pd
df = pd.read_csv("ETHUSDT-trades-2022-12-16.csv",
names = ["id", "price", "qty", "quoteQty", "time", "makerBuy", "bestPrice"])
df["time"] = pd.to_datetime(df["time"], unit = "ms")
df.set_index("time", inplace = True)
df.price.resample("240T").agg({
"open": "first",
"high": "max",
"low": "min",
"close": "last"
})
Input:
id price qty quoteQty makerBuy bestPrice
time
2022-12-16 00:00:00.000 1042612060 1266.53 0.0102 12.918606 True True
2022-12-16 00:00:00.000 1042612061 1266.53 0.0012 1.519836 True True
2022-12-16 00:00:00.005 1042612062 1266.54 0.0111 14.058594 False True
2022-12-16 00:00:00.008 1042612063 1266.53 0.0097 12.285341 True True
2022-12-16 00:00:00.011 1042612064 1266.53 0.0181 22.924193 True True
... ... ... ... ... ... ...
2022-12-16 23:59:59.991 1043495126 1166.82 0.0548 63.941736 True True
2022-12-16 23:59:59.994 1043495127 1166.83 0.0105 12.251715 False True
2022-12-16 23:59:59.995 1043495128 1166.83 0.0263 30.687629 False True
2022-12-16 23:59:59.997 1043495129 1166.82 0.0089 10.384698 True True
2022-12-16 23:59:59.999 1043495130 1166.83 0.0090 10.501470 False True
883071 rows × 6 columns
Output:
time open high low close
2022-12-16 00:00:00 1266.53 1273.87 1265.65 1271.79
2022-12-16 04:00:00 1271.78 1280.06 1267.13 1277.77
2022-12-16 08:00:00 1277.77 1279.27 1205.50 1212.83
2022-12-16 12:00:00 1212.84 1218.23 1195.04 1203.63
2022-12-16 16:00:00 1203.64 1204.58 1183.37 1202.50
2022-12-16 20:00:00 1202.50 1211.99 1155.32 1166.83
I want to migrate the Python code into C#. In fact, I actually did but the thing is, I want to use an actual DataFrame
framework and not just do everything manually. Microsoft.Data.Analysis
lacks a lot of features, so I decided to give Deedle
a try.
using Deedle;
var df = Frame.ReadCsv("ETHUSDT-trades-2022-12-16.csv",
hasHeaders: false);
var series = df["Column2"]; // Columns is supposed to be price
series.ResampleEquivalence(date => TimeSpan.FromMinutes(240), aggregate =>
{
});
df.Print();
The problem is that I'm unable to set actual names for the columns and do the resampling part. How is that going to work here?
using System.Globalization;
using CsvHelper;
using CsvHelper.Configuration;
using TickDataCustom;
var configuration = new CsvConfiguration(CultureInfo.InvariantCulture) { HasHeaderRecord = false };
using var reader = new StreamReader("ETHUSDT-trades-2022-12-16.csv");
using var csv = new CsvReader(reader, configuration);
csv.Context.RegisterClassMap<TradeMap>();
var records = csv.GetRecords<Trade>();
var source = records
.Select(x => new Trade2
{
Id = x.Id,
Price = x.Price,
Quantity = x.Quantity,
QuoteQuantity = x.QuoteQuantity,
Time = DateTimeOffset.FromUnixTimeMilliseconds(x.Timestamp).UtcDateTime,
MakerBuy = x.MakerBuy,
BestPrice = x.BestPrice
})
.ToList();
var period = TimeSpan.FromMinutes(240);
var ohlcv = source
.AsParallel()
.OrderBy(p => p.Time)
.Select(d => new
{
d.Time,
d.Price,
Span = d.Time.Ticks / period.Ticks
})
.GroupBy(d => d.Span)
.Select(g => new Ohlc
{
Timestamp = new DateTime(period.Ticks * g.Key),
Open = g.OrderBy(s => s.Time).First().Price,
Close = g.OrderBy(s => s.Time).Last().Price,
Low = g.Min(d => d.Price),
High = g.Max(d => d.Price)
})
.OrderBy(o => o.Timestamp);
foreach (var ohlc in ohlcv)
{
Console.WriteLine($"T:{ohlc.Timestamp}, O:{ohlc.Open:f2}, H:{ohlc.High:f2}, L:{ohlc.Low:f2}, C:{ohlc.Close:f2}");
}
Console.ReadLine();
public class TradeMap : ClassMap<Trade>
{
public TradeMap()
{
Map(p => p.Id).Index(0);
Map(p => p.Price).Index(1);
Map(p => p.Quantity).Index(2);
Map(p => p.QuoteQuantity).Index(3);
Map(p => p.Timestamp).Index(4);
Map(p => p.MakerBuy).Index(5);
Map(p => p.BestPrice).Index(6);
}
}
public class Trade
{
[Index(0)] public string Id { get; set; } = default!;
[Index(1)] public decimal Price { get; set; }
[Index(2)] public decimal Quantity { get; set; }
[Index(3)] public decimal QuoteQuantity { get; set; }
[Index(4)] public long Timestamp { get; set; }
[Index(5)] public bool MakerBuy { get; set; }
[Index(6)] public bool BestPrice { get; set; }
}
public class Ohlc
{
public DateTime Timestamp { get; set; }
public decimal Open { get; set; }
public decimal High { get; set; }
public decimal Low { get; set; }
public decimal Close { get; set; }
public decimal Volume { get; set; }
}
public class Trade2
{
public string Id { get; set; } = default!;
public decimal Price { get; set; }
public decimal Quantity { get; set; }
public decimal QuoteQuantity { get; set; }
public DateTime Time { get; set; }
public bool MakerBuy { get; set; }
public bool BestPrice { get; set; }
}
Below is my solution using Deedle.
First, we load the data as you did. You can then use IndexColumnsWith
to provide column names. In Deedle, you are not allowed to have duplicate row keys, so I'll create a frame with compisite key - formed by time
and id
(to disambiguate). This will let us nicely aggregate later. To do this, you can use IndexRowsUsing
which lets you select a new row key for each row:
using Deedle;
var df =
Frame.ReadCsv("c:/temp/ETHUSDT-trades-2022-12-16.csv", hasHeaders: false)
.IndexColumnsWith(new[] { "id", "price",
"qty", "quoteQty", "time", "makerBuy", "bestPrice" })
.IndexRowsUsing(row =>
Tuple.Create(DateTimeOffset.FromUnixTimeMilliseconds
((long)row["time"]), (int)row["id"]));
To do the aggregation, we'll use ResampleEquivalence
(to create the groups) followed by Select
(to calcualte OHLC for each group):
var prices = df["price"];
var start = new DateTime(2022, 1, 1);
var agg = prices.ResampleEquivalence(kv => {
// Calculate key - this is the start of the time-block to
// which the row belongs (i.e., 0:00, 4:00, 8:00, etc.)
var h = (int)((kv.Item1 - start).TotalMinutes / 240.0);
return start.AddMinutes(h * 240);
}).Select(data =>
new {
open = data.Value.FirstValue(),
high = data.Value.Max(),
close = data.Value.LastValue(),
low = data.Value.Min() });
This produces a series of anonymous records, which you can then turn nicely back into a data frame:
Frame.FromRecords(agg).Print();