Search code examples
c#.net

mlContext.Data.CreateEnumerable uses the init property?


Note: I'm new to C# so I might have misunderstood something about object initializers.

Consider this Minimal Reproducible Example (complete code at the end)

  • We define an IDataView object from an IEnumerable.
  • We want to put it back into another IEnumerable but that contains another (though with the same column names) object.
  • This second class has 2 properties that are arrays. If the property is only 'get', it won't be updated/initialized by mlContext.Data.CreateEnumerable (they'll be initialized to 0 and never updated). Otherwise, it will be updated/initialized by mlContext.Data.CreateEnumerable.

I thought that because the property is an array, it doesn't matter that it is get only because I can get it (get the reference to that object), and then modify its elements freely. However, it appears mlContext.Data.CreateEnumerable wants to replace that array with another one. Weirdly enough, if it is get only, I don't even get an Error. So I'm a bit confused.

Reading the complete MRE might yield a clearer view of what I don't understand:

using Microsoft.ML;
using Microsoft.ML.Data;

public class Program
{
    public static void Main()
    {
        var mlContext = new MLContext();

        SampleInput a = new SampleInput() { Value1 = new float[] { 1.0f }, Value2 = new float[] { 1.0f } };
        SampleInput b = new SampleInput() { Value1 = new float[] { 2.0f }, Value2 = new float[] { 2.0f } };
        List<SampleInput> inputData = [a, b];

        // Convert input data to IDataView
        IDataView inputView = mlContext.Data.LoadFromEnumerable(inputData);
        // Use CreateEnumerable to convert back to a list of SampleOutput
        var outputData = mlContext.Data.CreateEnumerable<SampleOutput>(inputView, reuseRowObject: false);
        // Print the output
        foreach (var output in outputData)
        {
            Console.WriteLine($"Field1: {string.Join(", ", output.Field1)}");  // 1 and 2, I'm happy about that
            Console.WriteLine($"Field2: {string.Join(", ", output.Field2)}");  // Always 0????????? 
            // But didn't throw an error when it tried to initialize the array????
        }
    }
}

public class SampleInput
{
    [ColumnName("Value1")]
    [VectorType(1)]
    public float[] Value1 { get; set; }

    [ColumnName("Value2")]
    [VectorType(1)]
    public float[] Value2 { get; set; }
}


public class SampleOutput
{
    [ColumnName("Value1")]
    [VectorType(1)]
    public float[] Field1 { get; init; } = new float[1];

    [ColumnName("Value2")]
    [VectorType(1)]
    public float[] Field2 { get; } = new float[1];
}

Note: Rider itself says "Auto-property can be made get-only" which (obviously) is not true given the example.

It is fairly logical that mlContext.Data.CreateEnumerable uses the init property, but why does it not throw any errors when I don't allow init? It clearly should (this works, this throws an error but Field2 doesn't throw an error when mlContext.Data.CreateEnumerable fails to update it).

I tried looking at the source code but got lost.


Solution

  • A common scenario of mapping is to create an object of the target type, and then invoke setters of this new object. As I understood from source code, it uses naming convention to match properties of the old type and property of the new type.

    Code looks perfectly fine, you did not declare a possibility to set a value and a method did not set it, because neither init or set were accessible. Why did not you add it? (rhetorical question). Maybe you don't care about specific columns and you "ignore" them by omitting specific properties. It would be not wise from authors to always throw exception.

    As for the array field assignment, yes, you can modify elements of the array even if init was used, because it is used only for assignment operator of the given property, not for the properties/indices of its type.

    About Rider suggestion - Rider (and other IDEs like Visual Studio) uses reflection to understand how types are being used. There is no usage of setter in your code, but it does not necessarily mean that it won't be attempted to be used. IDE simply can't know how types are used through reflection, especially by 3rd party dependencies. What you can do is supress the warning by picking it from quick actions, just for this property, or for the whole class or file.

    So, considering all this knowledge, your output class should look like this. You might want to change set to init, but it does not matter in this scenario

    public class SampleOutput
    {
        [ColumnName("Value1")]
        [VectorType(1)]
        public float[] Field1 { get; set; }
    
        [ColumnName("Value2")]
        [VectorType(1)]
        public float[] Field2 { get; set; }
    }
    

    Take a look at the Convert method below - CreateEnumerable method might look somewhat like this under the hood. Notice how many warnings Rider generate.

    using System.Reflection;
    
    var input = new SampleInput()
    {
        Field1 = [3f, 4f],
        Field2 = [5f, 6f]
    };
    
    var output = Convert<SampleInput, SampleOutput>(input);
    
    Console.WriteLine($"Output Field1.Length: {output.Field1.Length}, Field2.Length: {output.Field2.Length}");
    
    K Convert<T, K>(T tObj)
    {
        var tProperties = typeof(T).GetProperties(BindingFlags.Public | BindingFlags.Instance); // get properties of source type
        var kProperties = typeof(K).GetProperties(BindingFlags.Public | BindingFlags.Instance); // get properties of target type
    
        var kObj = (K)Activator.CreateInstance(typeof(K)); // create object of target type
    
        foreach (var tProperty in tProperties) // iterate over properties of the target type
        {
            var kProperty = kProperties.FirstOrDefault(x => x.Name == tProperty.Name); // match property by name (or other convention)
            if (kProperty != null && kProperty.CanWrite) // make sure there is init; or set; present on the property
            {
                var value = tProperty.GetValue(tObj); // get value of the source object
                
                kProperty.SetValue(kObj, value); // write value to the target object
            }
        }
        
        return kObj;
    }
    
    public class SampleInput
    {
        public float[] Field1 { get; set; } = []; //warning: get is never used
        public float[] Field2 { get; set; } = []; //warning: get is never used
    }
    
    public class SampleOutput
    {
        public float[] Field1 { get; init; } = []; //warning: init is never used
        public float[] Field2 { get; } = [];
    }
    

    Output: Output Field1.Length: 2, Field2.Length: 0

    You can obviously flip if (kProperty != null && kProperty.CanWrite) condition to throw an exception if property is missing on the target type