Search code examples
c#ml.net

ML.NET Convert type long to DateTime by TypeConvertingTransformer


I have a IDataView with Timestamp column of type long.
I'm trying to convert long type to DateTime type via transformer.
But I'm getting an error.

Error: System.ArgumentOutOfRangeException: source column 'Timestamp' with item type 'Int64' is not compatible with destination type 'DateTime' (Parameter 'inputSchema')

Is it possible to convert long to DateTime by TypeConvertingTransformer?

using Microsoft.ML;
using Microsoft.ML.Data;

MLContext context = new(seed: 1);

var rawData = new InputData[] {
    new() { Timestamp = 1590085800 },
    new() { Timestamp = 1590089400 },
    new() { Timestamp = 1590154200 },
    new() { Timestamp = 1590157800 },
    new() { Timestamp = 1590161400 },
    new() { Timestamp = 1674228600 },
    new() { Timestamp = 1674232200 },
    new() { Timestamp = 1674235800 },
    new() { Timestamp = 1674239400 },
    new() { Timestamp = 1674243000 },
};

var data = context.Data.LoadFromEnumerable(rawData);

var pipeline = context.Transforms.Conversion.ConvertType(
    "DateTime", "Timestamp", DataKind.DateTime);

var transformer = pipeline.Fit(data);

class InputData
{
    public long Timestamp { get; set; }
}

class TransformedData : InputData
{
    public DateTime DateTime { get; set; }
}

Solution

  • So I'm not sure what the Timestamp long values mean here but basically the DateTime or DateTimeOffset Parse methods don't work with long types which is why I suspect you're seeing this error. You could get this to work using a CustomMapping transform. Here's a sample of what the code would look like for that:

    // Initialize MLContext
    var ctx = new MLContext();
    
    // Create input data
    var data = new InputData[]
    {
        new () { Timestamp=1590085800L },
        new () { Timestamp=1590089400L },
        new () { Timestamp=1590154200L },
        new () { Timestamp=1590157800L } 
    };
    
    // Load data into IDataView
    var dv = ctx.Data.LoadFromEnumerable(data);
    
    // Define CustomMapping transform
    var ConvertToDateTime = (InputData input, IntermediateData output) => 
    {
        // Assumes the long represents seconds. Could be used with Milliseconds as well
        output.ConvertedTimeStamp = DateTimeOffset.FromUnixTimeSeconds(input.Timestamp).DateTime;
        // output.ConvertedTimeStamp = new DateTime(1970,1,1).AddSeconds(input.Timestamp); // Equivalent to code above
    };
    
    // Create pipeline
    var pipeline = ctx.Transforms.CustomMapping(ConvertToDateTime, null);
    
    // Apply pipeline to data
    var outputDv = pipeline.Fit(dv).Transform(dv);
    
    // Get converted column
    var convertedColumn = outputDv.GetColumn<DateTime>(nameof(IntermediateData.ConvertedTimeStamp));
    
    // Print out rows
    foreach(var c in convertedColumn)
    {
        Console.WriteLine($"{c.ToString()}");
    }
    
    public class InputData
    {
        public long Timestamp { get; set; }    
    }
    
    public class IntermediateData
    {
        public long Timestamp { get; set; }
        public DateTime ConvertedTimeStamp { get; set; }
    }
    

    When you run this the result should look something like this:

    5/21/2020 6:30:00 PM
    5/21/2020 7:30:00 PM
    5/22/2020 1:30:00 PM
    5/22/2020 2:30:00 PM