I'm using csvHelper to populate a database from a .CSV file. Any record being read from the .csv file has 3 child classes, either of these child classes may or may not already exist in the database.
My issue is, that csvHelper creates a new record for every child class present, where instead it should do a lookup for existing records in the database, and if present use this. The result is, that I end up with many duplicate entries in the child class tables.
I'm using Unity container with a unit of work approach.
Here is my code
public class Game {
public Referee Referee { get; set; }
public Team HomeTeam { get; set; }
public Team AwayTeam { get; set; }
}
public class Referee {
public string Name { get; set; }
}
public class Team {
public string Name { get; set; }
}
Here is my class map
public sealed class GameMap : CsvClassMap<Game> {
public GameMap () {
References<RefereeMap>(m => m.Referee);
References<HomeTeamMap>(m => m.HomeTeam);
References<AwayTeamMap>(m => m.AwayTeam);
}
}
public sealed class RefereeMap : CsvClassMap<Referee> {
public RefereeMap () {
Map(m => m.Name).Name("RefereeName");
}
}
public sealed class HomeTeamMap : CsvClassMap<Team> {
public HomeTeamMap () {
Map(m => m.Name).Name("TeamName");
}
}
public sealed class AwayTeamMap : CsvClassMap<Team> {
public AwayTeamMap () {
Map(m => m.Name).Name("TeamName");
}
}
I would typically access a CSV file using the CSV as a Stream resource, as such:
byte[] byteData = webClient.DownloadData(uriAddress);
Stream byteStream = new MemoryStream(byteData);
TextReader reader = new StreamReader(byteStream);
var csv = new CsvReader( reader );
csv.Configuration.RegisterClassMap<GameMap>();
csv.Configuration.RegisterClassMap<RefereeMap>();
csv.Configuration.RegisterClassMap<HomeTeamMap>();
csv.Configuration.RegisterClassMap<AwayTeamMap>();
var records = new List<Game>();
while (csv.Read())
{
records.Add(csv.GetRecord<Game>());
}
...
A few lines from a CSV file would typically look like this
Home, Away, Referee
Leeds, Leicester, Steve Dunn
Derby, Everton, Steve Dunn
Leicester, Man United, Andy Hall
Everton, Leicester, Andy Hall
In this example, there would be a total of 4 games created, 8 teams created and 4 referees. Leicester, Everton, Steve Dunn and Andy Hall would all contain duplicates, which is incorrect, ie three Team objects for Leicester and 2 Referee objects for Andy Hall etc.
The .CSV file that i'm using is in a flat format and has as a row for each game played. There are columns for the Home team, Away team and referee. There are other columns, but for the purpose of the question I'm asking, the other details are irrelevant.
When csvHelper reads a game record, it creates 1 new referee and 2 new teams. After reading 300 games, there are 300 referees (250+ duplicates) and 600 teams (550+ duplicates) in the database. Ideally, the referee and teams should be looked up from the database first before creating a new one.
Because there are two teams per game, two new team objects are created for each line or game in the CSV file.
I'm really not sure how to make this work using mappings, assistance would be welcome.
Thank you in advance.
I've looked at the CsvHelper
library and its internal object construction logic.
While you can use clever configuration tricks, I found that the best way to do what you want is very simple. You just keep track of the unique Referee
s and Team
s and link the Game
instances to the previous instances when available.
public static class SetExtensions {
public static TValue GetExistingOrAdd<TKey, TValue>(this Dictionary<TKey, TValue> set, TKey key, TValue value) {
TValue existing;
if (set.TryGetValue(key, out existing)) {
return existing;
}
set.Add(key, value);
return value;
}
}
class Program {
static void Main(string[] args) {
Stream inputStream = new MemoryStream();
using (var sw = new StreamWriter(inputStream, Encoding.UTF8, 4096, true)) {
sw.WriteLine("Home, Away, Referee");
sw.WriteLine("Leeds, Leicester, Steve Dunn");
sw.WriteLine("Derby, Everton, Steve Dunn");
sw.WriteLine("Leicester, Man United, Andy Hall");
sw.WriteLine("Everton, Leicester, Andy Hall");
}
inputStream.Position = 0;
TextReader reader = new StreamReader(inputStream);
var csv = new CsvReader(reader);
csv.Configuration.TrimFields = true;
csv.Configuration.TrimHeaders = true;
csv.Configuration.RegisterClassMap<GameMap>(); //You only need to register the "root" map
var referees = new Dictionary<string, Referee>(); //Stores unique referees. You can use the full Referee object as a key if you implement IEquatable<Referee> for Referee
var teams = new Dictionary<string, Team>();
var records = new List<Game>();
while (csv.Read()) {
var record = csv.GetRecord<Game>();
record.Referee = referees.GetExistingOrAdd(record.Referee.Name, record.Referee); //Try to link to existing object
record.HomeTeam = teams.GetExistingOrAdd(record.HomeTeam.Name, record.HomeTeam); //Try to link to existing object
record.AwayTeam = teams.GetExistingOrAdd(record.AwayTeam.Name, record.AwayTeam); //Try to link to existing object
records.Add(record);
}
}