Note: I'm new to C# so I might have misunderstood something about object initializers.
Consider this Minimal Reproducible Example (complete code at the end)
IDataView
object from an IEnumerable
.IEnumerable
but that contains another (though with the same column names) object.mlContext.Data.CreateEnumerable
(they'll be initialized to 0 and never updated). Otherwise, it will be updated/initialized by mlContext.Data.CreateEnumerable
.I thought that because the property is an array, it doesn't matter that it is get only because I can get it (get the reference to that object), and then modify its elements freely. However, it appears mlContext.Data.CreateEnumerable wants to replace that array with another one. Weirdly enough, if it is get only, I don't even get an Error. So I'm a bit confused.
Reading the complete MRE might yield a clearer view of what I don't understand:
using Microsoft.ML;
using Microsoft.ML.Data;
public class Program
{
public static void Main()
{
var mlContext = new MLContext();
SampleInput a = new SampleInput() { Value1 = new float[] { 1.0f }, Value2 = new float[] { 1.0f } };
SampleInput b = new SampleInput() { Value1 = new float[] { 2.0f }, Value2 = new float[] { 2.0f } };
List<SampleInput> inputData = [a, b];
// Convert input data to IDataView
IDataView inputView = mlContext.Data.LoadFromEnumerable(inputData);
// Use CreateEnumerable to convert back to a list of SampleOutput
var outputData = mlContext.Data.CreateEnumerable<SampleOutput>(inputView, reuseRowObject: false);
// Print the output
foreach (var output in outputData)
{
Console.WriteLine($"Field1: {string.Join(", ", output.Field1)}"); // 1 and 2, I'm happy about that
Console.WriteLine($"Field2: {string.Join(", ", output.Field2)}"); // Always 0?????????
// But didn't throw an error when it tried to initialize the array????
}
}
}
public class SampleInput
{
[ColumnName("Value1")]
[VectorType(1)]
public float[] Value1 { get; set; }
[ColumnName("Value2")]
[VectorType(1)]
public float[] Value2 { get; set; }
}
public class SampleOutput
{
[ColumnName("Value1")]
[VectorType(1)]
public float[] Field1 { get; init; } = new float[1];
[ColumnName("Value2")]
[VectorType(1)]
public float[] Field2 { get; } = new float[1];
}
Note: Rider itself says "Auto-property can be made get-only" which (obviously) is not true given the example.
It is fairly logical that mlContext.Data.CreateEnumerable uses the init property, but why does it not throw any errors when I don't allow init? It clearly should (this works, this throws an error but Field2 doesn't throw an error when mlContext.Data.CreateEnumerable fails to update it).
I tried looking at the source code but got lost.
A common scenario of mapping is to create an object of the target type, and then invoke setters of this new object. As I understood from source code, it uses naming convention to match properties of the old type and property of the new type.
Code looks perfectly fine, you did not declare a possibility to set a value and a method did not set it, because neither init
or set
were accessible. Why did not you add it? (rhetorical question). Maybe you don't care about specific columns and you "ignore" them by omitting specific properties. It would be not wise from authors to always throw exception.
As for the array field assignment, yes, you can modify elements of the array even if init
was used, because it is used only for assignment operator of the given property, not for the properties/indices of its type.
About Rider suggestion - Rider (and other IDEs like Visual Studio) uses reflection to understand how types are being used. There is no usage of setter in your code, but it does not necessarily mean that it won't be attempted to be used. IDE simply can't know how types are used through reflection, especially by 3rd party dependencies. What you can do is supress the warning by picking it from quick actions, just for this property, or for the whole class or file.
So, considering all this knowledge, your output class should look like this. You might want to change set
to init
, but it does not matter in this scenario
public class SampleOutput
{
[ColumnName("Value1")]
[VectorType(1)]
public float[] Field1 { get; set; }
[ColumnName("Value2")]
[VectorType(1)]
public float[] Field2 { get; set; }
}
Take a look at the Convert
method below - CreateEnumerable
method might look somewhat like this under the hood. Notice how many warnings Rider generate.
using System.Reflection;
var input = new SampleInput()
{
Field1 = [3f, 4f],
Field2 = [5f, 6f]
};
var output = Convert<SampleInput, SampleOutput>(input);
Console.WriteLine($"Output Field1.Length: {output.Field1.Length}, Field2.Length: {output.Field2.Length}");
K Convert<T, K>(T tObj)
{
var tProperties = typeof(T).GetProperties(BindingFlags.Public | BindingFlags.Instance); // get properties of source type
var kProperties = typeof(K).GetProperties(BindingFlags.Public | BindingFlags.Instance); // get properties of target type
var kObj = (K)Activator.CreateInstance(typeof(K)); // create object of target type
foreach (var tProperty in tProperties) // iterate over properties of the target type
{
var kProperty = kProperties.FirstOrDefault(x => x.Name == tProperty.Name); // match property by name (or other convention)
if (kProperty != null && kProperty.CanWrite) // make sure there is init; or set; present on the property
{
var value = tProperty.GetValue(tObj); // get value of the source object
kProperty.SetValue(kObj, value); // write value to the target object
}
}
return kObj;
}
public class SampleInput
{
public float[] Field1 { get; set; } = []; //warning: get is never used
public float[] Field2 { get; set; } = []; //warning: get is never used
}
public class SampleOutput
{
public float[] Field1 { get; init; } = []; //warning: init is never used
public float[] Field2 { get; } = [];
}
Output: Output Field1.Length: 2, Field2.Length: 0
You can obviously flip if (kProperty != null && kProperty.CanWrite)
condition to throw an exception if property is missing on the target type