I am trying to set up an Azure Data Factory transformation. I have a SQL Server database with three tables: Students, StudentClasses and Classes. I would like to use Azure Data Factory to read these tables and create a JSON structure. The structure is an array of Students. A Student should have a derived field which is an array of Classes. StudentClasses is a mapping that tells which Classes were taken by a given student. It has two fields StudentId and ClassId. I think I should be able to add a derived column to Student that is an array of Classes created using the join transformation. I have gotten as far as setting up the data flows for each of the tables. However, I can't figure out how to embed the array of classes in student. I have tried adding a derived array field to Student. However, it is not clear how to populate it with the content I desire.
I can add a derived field called class to Class with the following expression:
array(classID, className)
The end result I am looking for is:
{
"students": [
{
"studentID": 1,
"studentName": "John Doe",
"classes": [
{
"classID": "CS101-01",
"className": "Introduction to Computer Science"
},
{
"classID": "CS102-01",
"className": "Mathematics for Computer Science"
}
]
},
{
"studentID": 2,
"studentName": "Jane Smith",
"classes": [
// ... (similar structure for Jane Smith)
]
}
// ... (additional students with their respective classes)
]
}
Array(classID, className)
expression in derived column transformation will create an array of ClassID
, ClassName
. But this will not create as proper key name in key:value format created. Also, this will not be in the nested format under each student. To solve this, you can give expression like collect(@(ClassId=ClassId,ClassName=ClassName))
in the aggregate transformation. Below is the detailed approach.
students
and studentsClasses
table based on the studentid
field. Then Join the resultant table with classes
table based on classid
field. Then remove the duplicate columns which came as a result of join transformations using select transformation
.Data Preview of select transformation after joining all three tables:
StudentId | StudentName | ClassId | ClassName |
---|---|---|---|
1 | John Doe | CS101-01 | Introduction to Computer Science |
1 | John Doe | CS102-01 | Mathematics for Computer Science |
2 | Jane Smith | CS101-01 | Introduction to Computer Science |
2 | Jane Smith | CS103-01 | Data Structures and Algorithms |
3 | Bob Johnson | CS102-01 | Mathematics for Computer Science |
3 | Bob Johnson | CS104-01 | Database Systems |
StudentId
and StudentName
in group by fields and in aggregates, create a new column named Classes
and give the expression as collect(@(ClassId=ClassId,ClassName=ClassName))
Data preview of aggregate transformation:
Students
. Give the expression as collect(@(StudentId=StudentId,StudentName=StudentName,Classes=Classes))
Data preview of Aggragate2:
Output Json:
{
"Students": [
{
"StudentId": "1",
"StudentName": "John Doe",
"Classes": [
{
"ClassId": "CS101-01",
"ClassName": "Introduction to Computer Science"
},
{
"ClassId": "CS102-01",
"ClassName": "Mathematics for Computer Science"
}
]
},
{
"StudentId": "2",
"StudentName": "Jane Smith",
"Classes": [
{
"ClassId": "CS101-01",
"ClassName": "Introduction to Computer Science"
...........
}
]
}
}