I've been puzzling with this one for a while. Given these classes:
public class Person
{
public int Id { get; set; }
public string Name { get; set;}
public List<Book> Books { get; set; }
}
public class Book
{
public Guid Id { get; set; }
public string Name { get; set;}
public List<Chapter> Chapters{ get; set; }
public List<Author> Authors { get; set; }
}
public class Author
{
public string Name { get; set; }
}
public class Chapter
{
public string Name { get; set;}
}
I'd like to spit out a csv that doesn't repeat the top level relationship, and starts new rows in the correct column. For Example
personId | Name | BookName | AuthorName | ChapterName |
1 | name1 | Book1Name | AuthorName1 | chapterName1|
| | | AuthorName2 | |
| | Book2Name | AuthorName1 | chapterName1|
2 | name2 | Book3Name | AuthorName3 | chapterName3|
3 | name3 | Book1Name | | chapterName4|
| | | | chapterName5|
I've tried messing around with the csv maps and calling references, and also using csv.WriteField(null) to skip to the column I want, but nothing works exactly like I need it to.
The difference in structure between your your data model object graph and your desired final output model is too extreme for CsvHelper (or any other serializer) to do the mapping "on the fly". Instead, you will need to create some DTO that represents your desired final CSV row format, then create some elaborate LINQ query that projects your object graph to an enumerable of DTOs. Having done that, you will be able to serialize the DTOs using CsvHelper.
First, for a DTO, you can use a simple record like the following:
public record PersonAndBookDTO(int? personId, string? Name, string? BookName, string? AuthorName, string? ChapterName);
Next, define the following query:
var query =
from p in people
from b in p.Books.DefaultIfEmpty().Select((book, index) => (book, index))
let firstBookOfPerson = b.index == 0
from pair in (b.book?.Authors?.Select(a => a.Name)).DefaultIfEmptyOrNull()
.Merge((b.book?.Chapters?.Select(c => c.Name)).DefaultIfEmptyOrNull(),
(author, chapter) => (author, chapter))
.Select((pair, index) => (pair.author, pair.chapter, index))
let firstPairOfBook = pair.index == 0
select new PersonAndBookDTO(firstBookOfPerson && firstPairOfBook? p.Id : (int?)null,
firstBookOfPerson && firstPairOfBook? p.Name : null,
firstPairOfBook ? b.book?.Name : null,
pair.author, pair.chapter);
Using the following extension methods:
public static class EnumerableExtensions
{
// Copied from this anser https://stackoverflow.com/a/6624756/3744182 by https://stackoverflow.com/users/203499/damian
// To https://stackoverflow.com/questions/1190657/add-two-lists-of-different-length-in-c-sharp
public static IEnumerable<TResult> Merge<TFirst,TSecond,TResult>(this IEnumerable<TFirst> first,
IEnumerable<TSecond> second, Func<TFirst, TSecond, TResult> operation) {
using (var iter1 = first.GetEnumerator()) {
using (var iter2 = second.GetEnumerator()) {
while (iter1.MoveNext()) {
if (iter2.MoveNext()) {
yield return operation(iter1.Current, iter2.Current);
} else {
yield return operation(iter1.Current, default(TSecond));
}
}
while (iter2.MoveNext()) {
yield return operation(default(TFirst), iter2.Current);
}
}
}
}
public static IEnumerable<TSource?> DefaultIfEmptyOrNull<TSource>(this IEnumerable<TSource>? source) => (source ?? Enumerable.Empty<TSource>()).DefaultIfEmpty();
}
And now you will be able to serialize to CSV as follows:
var sb = new StringBuilder();
using (var writer = new StringWriter(sb)) // Or use new StreamWriter(filename) if serializing to a file
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.WriteRecords(query);
}
Notes:
Books.DefaultIfEmpty()
is used to ensure one row is generated for each person even if that person has no books.
Authors.DefaultIfEmptyOrNull()
and Chapters.DefaultIfEmptyOrNull()
are used to ensure one row is generated per book even if the book has no authors or chapters.
Enumerable.Select((item, index) => (item, index))
is used to keep track of whether a given book or author/chapter pair is the first one encountered in its parent. This is necessary because you don't want to duplicate person or book names.
Your question doesn't indicate how to pair together items from the list of authors and list of chapters when there are more than one of both. I adopted the Merge()
solution from this answer by damian to Add two Lists of different length in C# which pairs them by index and uses a default value when one list is longer than the other. This matches the required output for the simple cases shown in your question.
If you require a different pairing algorithm, adjust EnumerableExtensions.Merge()
as necessary.
This generates the following CSV:
personId,Name,BookName,AuthorName,ChapterName
1,name1,Book1Name,AuthorName1,chapterName1
,,,AuthorName2,
,,Book2Name,AuthorName1,chapterName1
2,name2,Book3Name,AuthorName3,chapterName3
3,name3,Book1Name,,chapterName4
,,,,chapterName5
Which results in the following table when imported into Excel:
Demo fiddle here.