I'm trying to join two DataTables on a dynamic number of columns. I've gotten as far as the code below. The problem is the ON statement of the join. How can I make this dynamic based on how many column names are in the list "joinColumnNames".
I was thinking I will need to build some sort of expression tree, but I can't find any examples of how to do this with multiple join columns and with the DataRow object which doesn't have properties for each column.
private DataTable Join(List<string> joinColumnNames, DataTable pullX, DataTable pullY)
{
DataTable joinedTable = new DataTable();
// Add all the columns from pullX
foreach (string colName in joinColumnNames)
{
joinedTable.Columns.Add(pullX.Columns[colName]);
}
// Add unique columns from PullY
foreach (DataColumn col in pullY.Columns)
{
if (!joinedTable.Columns.Contains((col.ColumnName)))
{
joinedTable.Columns.Add(col);
}
}
var Join = (from PX in pullX.AsEnumerable()
join PY in pullY.AsEnumerable() on
// This must be dynamic and join on every column mentioned in joinColumnNames
new { A = PX[joinColumnNames[0]], B = PX[joinColumnNames[1]] } equals new { A = PY[joinColumnNames[0]], B = PY[joinColumnNames[1]] }
into Outer
from PY in Outer.DefaultIfEmpty<DataRow>(pullY.NewRow())
select new { PX, PY });
foreach (var item in Join)
{
DataRow newRow = joinedTable.NewRow();
foreach (DataColumn col in joinedTable.Columns)
{
var pullXValue = item.PX.Table.Columns.Contains(col.ColumnName) ? item.PX[col.ColumnName] : string.Empty;
var pullYValue = item.PY.Table.Columns.Contains(col.ColumnName) ? item.PY[col.ColumnName] : string.Empty;
newRow[col.ColumnName] = (pullXValue == null || string.IsNullOrEmpty(pullXValue.ToString())) ? pullYValue : pullXValue;
}
joinedTable.Rows.Add(newRow);
}
return joinedTable;
}
Adding a specific example to show input/output using 3 join columns (Country, Company, and DateId):
Pull X:
Country Company DateId Sales United States Test1 Ltd 20160722 $25 Canada Test3 Ltd 20160723 $30 Italy Test4 Ltd 20160724 $40 India Test2 Ltd 20160725 $35
Pull Y:
Country Company DateId Downloads United States Test1 Ltd 20160722 500 Mexico Test2 Ltd 20160723 300 Italy Test4 Ltd 20160724 900
Result:
Country Company DateId Sales Downloads United States Test1 Ltd 20160722 $25 500 Canada Test3 Ltd 20160723 $30 Mexico Test2 Ltd 20160723 300 Italy Test4 Ltd 20160724 $40 900 India Test2 Ltd 20160725 $35
var Join =
from PX in pullX.AsEnumerable()
join PY in pullY.AsEnumerable()
on string.Join("\0", joinColumnNames.Select(c => PX[c]))
equals string.Join("\0", joinColumnNames.Select(c => PY[c]))
into Outer
from PY in Outer.DefaultIfEmpty<DataRow>(pullY.NewRow())
select new { PX, PY };
Another way is to have both DataTable
in a DataSet
and use DataRelation
How To: Use DataRelation to perform a join on two DataTables in a DataSet?