I'm working on an ML.NET project where I load data into an IDataView, apply an ONNX model transformation, and then access the transformed output.
My Input class is in the same assembly as my Output class and as my Model class. I mark my Input class as internal
. I mark my Input class properties as internal
and I have an error. When I mark my properties as public
(class is still internal
), it works.
I thought if the class was marked as internal
, there would be no difference if the properties/methods are internal
or public
(quote: "a public member of an internal class is effectively internal."). Clearly it does.
Here is a Minimal Reproducible Example:
using Microsoft.ML;
using Microsoft.ML.Data;
namespace ConsoleApp1
{
internal class InputData
{
[ColumnName("onnx::Concat_0")]
[VectorType(342)]
// public float[] XA1 { get; set; } = new float[342]; // Works
internal float[] XA1 { get; set; } = new float[342]; // Fails
}
public class OutputData
{
[ColumnName("onnx::Concat_0")]
[VectorType(342)]
public float[] XA1 { get; set; }
}
class Program
{
static void Main()
{
var mlContext = new MLContext();
var inputData = new[]
{
new InputData { XA1 = new float[342] }
};
IDataView inputDataView = mlContext.Data.LoadFromEnumerable(inputData);
// Simulate an ONNX transformation pipeline
var pipeline = mlContext.Transforms.CopyColumns(
"onnx::Concat_0",
"onnx::Concat_0"
);
var model = pipeline.Fit(inputDataView);
IDataView transformedData = model.Transform(inputDataView);
var transformed = mlContext.Data.CreateEnumerable<OutputData>(transformedData, reuseRowObject: false);
foreach (var item in transformed)
{
Console.WriteLine($"XA1 Length: {item.XA1.Length}");
}
}
}
}
In case that helps:
net8.0
.JetBrains Rider 2024.2.6
.Indeed in a "normal" context, there's no difference between an internal and a public member of an interal type, however in a reflection context, there is and this is what causes this behaviour.
If you navigate1 in ML.NET's code, when you do mlContext.Data.LoadFromEnumerable(inputData);
, ML.NET builds up informations about your class's fields and properties inside SchemaDefinition.GetMemberInfos(Type, Direction)
, here's the relevant code inside it:
var fieldInfos = userType.GetFields(BindingFlags.Public | BindingFlags.Instance);
var propertyInfos = userType.GetProperties(BindingFlags.Public | BindingFlags.Instance)
...
Here you can see they use BindingFlags.Public
, which means take only public properties/fields. If they wanted to process internal (or rather, all non-public) properties/fields they'd have used BindingFlags.NonPublic
(or BindingFlags.Public | BindingFlags.NonPublic
if they wanted both public and non-public).
This kind of reflection based operations typically only care about public members because, as the accepted answer to the question you linked says, non-public members are implementation details.
When you do pipeline.Fit(inputDataView)
it looks for the property with [ColumnName("onnx::Concat_0")]
, but since it didn't process your internal property, it finds none and throws that exception.
1 The whole path from your call to LoadFromEnumerable
to the code doing reflection is:
DataOperationsCatalog.LoadFromEnumerable<TRow>(IEnumerable<TRow>, SchemaDefinition)
DataViewConstructionUtils.CreateFromEnumerable<TRow>(IHostEnvironment, IEnumerable<TRow>, SchemaDefinition)
InternalSchemaDefinition.Create(Type, SchemaDefinition.Direction)
SchemaDefinition.Create(Type, Direction)
SchemaDefinition.GetMemberInfos(Type, Direction)