Search code examples
c#csvpolymorphismcsvhelper

How to parse data from a CSV file with a single header into different types of instances using GetRecord?


I have tried following the approach in the link https://joshclose.github.io/CsvHelper/examples/reading/reading-multiple-record-types/ to implement a specific requirement of converting a single CSV file into instances of different types. However, the example provided in the link chooses not to write the CSV header, which has posed a challenge for me. In my ClassMap implementations, I cannot use Name and AutoMap because removing the header causes issues. And manually specifying the Index in each ClassMap seems tedious.

So I tried writing the header in the CSV file and set the HasHeaderRecord property of the CsvConfiguration used by CsvReader to true. At the same time, I temporarily kept the mapping using the Index Way. The core code is as follows:

 public void CSVDeserialize() {
        var config = new CsvConfiguration(CultureInfo.InvariantCulture) {
            HasHeaderRecord = true,
            MissingFieldFound = null
        };
        using (var reader = new StreamReader(Application.dataPath + "/Data/Events/0.csv"))
            using (var csv = new CsvReader(reader, config)) {
                csv.Context.RegisterClassMap<FUF_Event_Dialogue.ClassMap>();
                csv.Context.RegisterClassMap<FUF_Event_Avatar.ClassMap>();

                List<FUF_Event_Dialogue> dialogues = new List<FUF_Event_Dialogue>();
                List<FUF_Event_Avatar> avatars = new List<FUF_Event_Avatar>();

                while (csv.Read()) {
                    switch (csv.GetField(0)) {
                        case "Dialogue":
                            dialogues.Add(csv.GetRecord<FUF_Event_Dialogue>());
                            break;
                        case "Avatar":
                            avatars.Add(csv.GetRecord<FUF_Event_Avatar>());
                            break;
                    }
                } 

                // Print Results
                foreach (FUF_Event_Avatar avatar in avatars) {
                    Debug.Log($"[Avatar] ID: {avatar.ID}, Delay: {avatar.Delay}, IsAsync: {avatar.IsAsync}, Animation: \"{avatar.Animation.AvatarId + "," + avatar.Animation.Label + "," + avatar.Animation.Emotion + "," + avatar.Animation.IsShutUp + "," + avatar.Animation.IsSkipPreAnim}\", Movement: {avatar.Movement.TimeSpeed + (avatar.Movement.UseSpeed ? "/s" : "ms") + "=>" + avatar.Movement.Position.x + "," + avatar.Movement.Position.y}");
                }

                foreach (FUF_Event_Dialogue dialogue in dialogues) {
                    Debug.Log($"[Dialogue] ID: {dialogue.ID}, Delay: {dialogue.Delay}, IsAsync: {dialogue.IsAsync}, Content: {dialogue.Content.ID}, Label: {dialogue.Label}, Branches: \"{dialogue.Branches?.Select(b => b.Content.ID + "," + b.GotoId).Aggregate((c, n) => c + ";" + n)}\"");
                }
            }
    }

public class FUF_Event_Dialogue : IFUF_Event {
        public int ID { get; set; }
        public float Delay { get; set; }
        public bool IsAsync { get; set; }
        public FUF_String Content { get; set; }
        public string Label { get; set; }
        public FUF_Branch[] Branches => branches ??= branchEnumerable?.ToArray();

        FUF_Branch[] branches;
        IEnumerable<FUF_Branch> branchEnumerable;

        public sealed class ClassMap : ClassMap<FUF_Event_Dialogue> {
            public ClassMap() {
                Map(m => m.ID).Index(1);
                Map(m => m.Delay).Index(2);
                Map(m => m.IsAsync).Index(3).Default(false);
                Map(m => m.Content).Index(4).Default(null).TypeConverter<FUF_String.Deserializer>();
                Map(m => m.Label).Index(5).Default("");
                Map(m => m.branchEnumerable).Index(6).TypeConverter<FUF_Branch.Deserializer>();
            }
        }
}

public class FUF_Event_Avatar : IFUF_Event {
        public int ID { get; set; }
        public float Delay { get; set; }
        public bool IsAsync { get; set; }
        public FUF_Animation Animation { get; set; }
        public FUF_Movement Movement { get; set; }

        public sealed class ClassMap : ClassMap<FUF_Event_Avatar> {
            public ClassMap() {
                Map(m => m.ID).Index(1);
                Map(m => m.Delay).Index(2).Default(0);
                Map(m => m.IsAsync).Index(3).Default(false);
                Map(m => m.Animation).Index(4).TypeConverter<FUF_Animation.Deserializer>();
                Map(m => m.Movement).Index(5).TypeConverter<FUF_Movement.Deserializer>();
            }
        }
}

After executing this, I noticed that the first row after the header is being ignored and not successfully parsed.Here are my CSV and the output log:

CSV:

Type,ID,Delay,IsAsync,Property_4,Property_5,Property_6
Avatar,0,0,FALSE,"0,Jack,Stand,False,True","100ms=>256,128"
Avatar,1,0,FALSE,"0,Mary,Stand,False,True","20/s=>-256,128"
Dialogue,2,0,FALSE,1001,Jack
Dialogue,3,0,FALSE,1002,Mary,"9001,1005;9002,1006"
Dialogue,4,0,FALSE,1003,Jack
Dialogue,5,0,FALSE,1004,Jack,"256,128;257,129"

Log:

[Avatar] ID: 1, Delay: 0, IsAsync: False, Animation: "0,Mary,Stand,False,True", Movement: 20/s=>-256,128
[Dialogue] ID: 2, Delay: 0, IsAsync: False, Content: 1001, Label: Jack, Branches: ""
[Dialogue] ID: 3, Delay: 0, IsAsync: False, Content: 1002, Label: Mary, Branches: "9001,1005;9002,1006"
[Dialogue] ID: 4, Delay: 0, IsAsync: False, Content: 1003, Label: Jack, Branches: ""
[Dialogue] ID: 5, Delay: 0, IsAsync: False, Content: 1004, Label: Jack, Branches: "256,128;257,129"

Why is this happening? (I can confirm that without specifying the Header and setting HasHeaderRecord to false, everything is parsed successfully.)

Next, if I change the properties in the ClassMap that have the same names as the Header in the CSV to use the default Name-based parsing or use AutoMap, I encounter the issue where these properties are all parsed as default values like 0 or false. Here is the code I modified and the corresponding log:

        public sealed class ClassMap : ClassMap<FUF_Event_Dialogue> {
            public ClassMap() {
                Map(m => m.ID);
                Map(m => m.Delay);
                Map(m => m.IsAsync).Default(false);
                Map(m => m.Content).Index(4).Default(null).TypeConverter<FUF_String.Deserializer>();
                Map(m => m.Label).Index(5).Default("");
                Map(m => m.branchEnumerable).Index(6).TypeConverter<FUF_Branch.Deserializer>();
                // Or AutoMap, cause the same issue
                /*AutoMap(new CsvConfiguration(CultureInfo.InvariantCulture) {
                    HasHeaderRecord = true,
                    MissingFieldFound = null
                });*/
            }
        }

Log:

[Avatar] ID: 0, Delay: 0, IsAsync: False, Animation: "0,Mary,Stand,False,True", Movement: 20/s=>-256,128
[Dialogue] ID: 0, Delay: 0, IsAsync: False, Content: 1001, Label: Jack, Branches: ""
[Dialogue] ID: 0, Delay: 0, IsAsync: False, Content: 1002, Label: Mary, Branches: "9001,1005;9002,1006"
[Dialogue] ID: 0, Delay: 0, IsAsync: False, Content: 1003, Label: Jack, Branches: ""
[Dialogue] ID: 0, Delay: 0, IsAsync: False, Content: 1004, Label: Jack, Branches: "256,128;257,129"

As you can see, in this scenario, using AutoMap or Parsing base on Name for the "ID" field will not work correctly, and you will only get 0 as the result.

If you modify the ClassMap for the type of the first row that was ignored after the Header, even more severe issues can occur:

        public sealed class ClassMap : ClassMap<FUF_Event_Avatar> {
            public ClassMap() {
                Map(m => m.ID);
                Map(m => m.Delay).Default(0);
                Map(m => m.IsAsync).Default(false);
                Map(m => m.Animation).Index(4).TypeConverter<FUF_Animation.Deserializer>();
                Map(m => m.Movement).Index(5).TypeConverter<FUF_Movement.Deserializer>();
                // Or AutoMap, cause the same issue
                /*AutoMap(new CsvConfiguration(CultureInfo.InvariantCulture) {
                    HasHeaderRecord = true,
                    MissingFieldFound = null
                });*/
            }
        }

EROOR:

HeaderValidationException: Header with name 'ID'[0] was not found.
Header with name 'Delay'[0] was not found.
Header with name 'IsAsync'[0] was not found.
Headers: 'Avatar', '0', '0', 'FALSE', '0,Jack,Stand,False,True', '100ms=>256,128'
Headers: 'Avatar', '0', '0', 'FALSE', '0,Jack,Stand,False,True', '100ms=>256,128'
Headers: 'Avatar', '0', '0', 'FALSE', '0,Jack,Stand,False,True', '100ms=>256,128'
Headers: 'Avatar', '0', '0', 'FALSE', '0,Jack,Stand,False,True', '100ms=>256,128'
Headers: 'Avatar', '0', '0', 'FALSE', '0,Jack,Stand,False,True', '100ms=>256,128'
Headers: 'Avatar', '0', '0', 'FALSE', '0,Jack,Stand,False,True', '100ms=>256,128'
If you are expecting some headers to be missing and want to ignore this validation, set the configuration HeaderValidated to null. You can also change the functionality to do something else, like logging the issue.

IReader state:
   ColumnCount: 0
   CurrentIndex: 0
   HeaderRecord:
["Avatar","0","0","FALSE","0,Jack,Stand,False,True","100ms=>256,128"]
IParser state:
   ByteCount: 0
   CharCount: 117
   Row: 2
   RawRow: 2
   Count: 6
   RawRecord:
Avatar,0,0,FALSE,"0,Jack,Stand,False,True","100ms=>256,128"


CsvHelper.Configuration.ConfigurationFunctions.HeaderValidated (CsvHelper.HeaderValidatedArgs args) (at <809ff6784d2e43bdb7ac8274e1351270>:0)
CsvHelper.CsvReader.ValidateHeader (System.Type type) (at <809ff6784d2e43bdb7ac8274e1351270>:0)
CsvHelper.CsvReader.ValidateHeader[T] () (at <809ff6784d2e43bdb7ac8274e1351270>:0)
CsvHelper.CsvReader.GetRecord[T] () (at <809ff6784d2e43bdb7ac8274e1351270>:0)

From the error log, you can see that 'Avatar', '0', '0', 'FALSE', '0,Jack,Stand,False,True', '100ms=>256,128', which is the line that was previously mentioned as being ignored, is actually treated as the Header.

Is there a way to address these issues without removing the Header? If possible, I would still like to include the Header to keep the code and CSV clean.


Solution

  • You could have found your problem by modifying your Read() loop to add a default: case:

    while (csv.Read()) {
        string type;
        switch (type = csv.GetField(0)) {
            // Cases "Dialogue" and "Avatar" as before
            default:
                //Debug.Log(string.Format("Unknown type {0}", type));
                Console.WriteLine(string.Format("Unknown type {0}", type)); // 
                break;
        }
    } 
    

    If you had, you would have seen an error message:

    Unknown type Type
    

    Thus your header row is actually getting consumed and discarded by the first Read(). So when does the header record actually get read and processed? Possibilities include:

    1. Whenever you explicitly call CsvReader.ReadHeader() (which you don't do), or

    2. Your first call to CsvReader.GetRecord<T>(). As can be seen from the source code, this method automatically reads the current row as the header row if not already read, then reads the row after than as a data row:

      public virtual T? GetRecord<T>()
      {
          CheckHasBeenRead();
      
          if (headerRecord == null && hasHeaderRecord)
          {
              ReadHeader();
              ValidateHeader<T>();
      
              if (!Read())
              {
                  return default;
              }
          }
      

    Thus what your code actually does is:

    1. Skips past the first row.
    2. Reads the second row as the header row.
    3. Reads the third row as the first data row.

    Demo fiddle #1 here.[1]

    Since you obviously don't want that, you should simply set CsvConfiguration.HasHeaderRecord = false and your CSV file will be read correctly, because the Read() loop will discard the initial header line (and any other line whose first field is neither Avatar nor Dialogue.) Furthermore, I would argue that this fix is correct. Your two models FUF_Event_Dialogue and FUF_Event_Avatar have different property names, so a polymorphic header row for both would have column titles for the combined set of properties, like so:

    Type,ID,Delay,IsAsync,Content,Label,Branches,Animation,Movement
    

    Then when when deserializing a specific row only the relevant columns would get used. But in your actual CSV the column titles are meaningless anyway (as there are no actual properties named Property_4, Property_5 or Property_6 in any of your models) and only the indices matter.

    Demo fiddle #2 here.


    [1] The code shown in your question cannot be compiled because it is missing the type definitions for several properties. I was able to reproduce the problem by using string for any unknown property types.