Search code examples
c#c#-4.0csvhelper

csvHelper dynamically loading child class


I'm using csvHelper to populate a database from a .CSV file. Any record being read from the .csv file has 3 child classes, either of these child classes may or may not already exist in the database.

My issue is, that csvHelper creates a new record for every child class present, where instead it should do a lookup for existing records in the database, and if present use this. The result is, that I end up with many duplicate entries in the child class tables.

I'm using Unity container with a unit of work approach.

Here is my code

 public class Game {    
     public Referee Referee { get; set; } 
     public Team HomeTeam { get; set; }
     public Team AwayTeam { get; set; }
 }

 public class Referee {
    public string Name { get; set; }
 }

 public class Team {
     public string Name { get; set; }   
 }

Here is my class map

 public sealed class GameMap : CsvClassMap<Game> {
    public GameMap ()        {
        References<RefereeMap>(m => m.Referee);
         References<HomeTeamMap>(m => m.HomeTeam);
         References<AwayTeamMap>(m => m.AwayTeam);
    }
 }

 public sealed class RefereeMap : CsvClassMap<Referee> {
    public RefereeMap ()        {
        Map(m => m.Name).Name("RefereeName");
    }
 }

 public sealed class HomeTeamMap : CsvClassMap<Team> {
    public HomeTeamMap ()       {
        Map(m => m.Name).Name("TeamName");
    }
 }

 public sealed class AwayTeamMap : CsvClassMap<Team> {
    public AwayTeamMap ()       {
        Map(m => m.Name).Name("TeamName");
    }
 }

I would typically access a CSV file using the CSV as a Stream resource, as such:

byte[] byteData = webClient.DownloadData(uriAddress);
Stream byteStream = new MemoryStream(byteData);
TextReader reader = new StreamReader(byteStream);

var csv = new CsvReader( reader );
csv.Configuration.RegisterClassMap<GameMap>();
csv.Configuration.RegisterClassMap<RefereeMap>();
csv.Configuration.RegisterClassMap<HomeTeamMap>();
csv.Configuration.RegisterClassMap<AwayTeamMap>();

var records = new List<Game>();
while (csv.Read())
{
     records.Add(csv.GetRecord<Game>());
}
...

A few lines from a CSV file would typically look like this

Home,      Away,       Referee
Leeds,     Leicester,  Steve Dunn
Derby,     Everton,    Steve Dunn
Leicester, Man United, Andy Hall
Everton,   Leicester,  Andy Hall

In this example, there would be a total of 4 games created, 8 teams created and 4 referees. Leicester, Everton, Steve Dunn and Andy Hall would all contain duplicates, which is incorrect, ie three Team objects for Leicester and 2 Referee objects for Andy Hall etc.

The .CSV file that i'm using is in a flat format and has as a row for each game played. There are columns for the Home team, Away team and referee. There are other columns, but for the purpose of the question I'm asking, the other details are irrelevant.

When csvHelper reads a game record, it creates 1 new referee and 2 new teams. After reading 300 games, there are 300 referees (250+ duplicates) and 600 teams (550+ duplicates) in the database. Ideally, the referee and teams should be looked up from the database first before creating a new one.

Because there are two teams per game, two new team objects are created for each line or game in the CSV file.

I'm really not sure how to make this work using mappings, assistance would be welcome.

Thank you in advance.


Solution

  • I've looked at the CsvHelper library and its internal object construction logic.

    While you can use clever configuration tricks, I found that the best way to do what you want is very simple. You just keep track of the unique Referees and Teams and link the Game instances to the previous instances when available.

    public static class SetExtensions {
        public static TValue GetExistingOrAdd<TKey, TValue>(this Dictionary<TKey, TValue> set, TKey key, TValue value) {
            TValue existing;
            if (set.TryGetValue(key, out existing)) {
                return existing;
            }
            set.Add(key, value);
            return value;
        }
    }
    
    class Program {
        static void Main(string[] args) {
            Stream inputStream = new MemoryStream();
            using (var sw = new StreamWriter(inputStream,  Encoding.UTF8, 4096, true)) {
                sw.WriteLine("Home,      Away,       Referee");
                sw.WriteLine("Leeds,     Leicester,  Steve Dunn");
                sw.WriteLine("Derby,     Everton,    Steve Dunn");
                sw.WriteLine("Leicester, Man United, Andy Hall");
                sw.WriteLine("Everton,   Leicester,  Andy Hall");
            }
            inputStream.Position = 0;
    
            TextReader reader = new StreamReader(inputStream);
    
            var csv = new CsvReader(reader);
            csv.Configuration.TrimFields = true;
            csv.Configuration.TrimHeaders = true;
            csv.Configuration.RegisterClassMap<GameMap>(); //You only need to register the "root" map
    
            var referees = new Dictionary<string, Referee>(); //Stores unique referees. You can use the full Referee object as a key if you implement IEquatable<Referee> for Referee
            var teams = new Dictionary<string, Team>();
            var records = new List<Game>();
            while (csv.Read()) {
                var record = csv.GetRecord<Game>();
                record.Referee = referees.GetExistingOrAdd(record.Referee.Name, record.Referee); //Try to link to existing object
                record.HomeTeam = teams.GetExistingOrAdd(record.HomeTeam.Name, record.HomeTeam); //Try to link to existing object
                record.AwayTeam = teams.GetExistingOrAdd(record.AwayTeam.Name, record.AwayTeam); //Try to link to existing object
    
                records.Add(record);
            }
        }