I am trying to download some data from a REST API using Azure Data Factory. I have successfully downloaded it to Azure Storage as a JSON file, but the data when received is like this:
{
"@odata.context": "https://url/endpoint",
"@odata.nextLink": "https://url/endpoint?$skiptoken=token",
"value": [ { Page1Item1 }, { Page1Item2 } ]
{
"@odata.context": "https://url/endpoint",
"@odata.nextLink": "https://url/endpoint?$skiptoken=token",
"value": [ { Page2Item1 }, { Page2Item2 } ]
These multiple lines are separated by linebreaks in the JSON file. I'd like to unroll them and create a JSON file that has the columns of the individual items from those pages - e.g. if the item data structure was
{
"name": "Name",
"desc": "Description"
}
then I'd want the output file to be (order of items not important):
{ "name": "Page1Item1Name", "desc": "Page1Item1Desc" }
{ "name": "Page1Item2Name", "desc": "Page1Item2Desc" }
{ "name": "Page2Item1Name", "desc": "Page2Item1Desc" }
{ "name": "Page2Item2Name", "desc": "Page2Item2Desc" }
I've managed to achieve this with a dataflow using the "flatten" activity using unroll by: value
. However, to get the dataflow to run in a pipeline, I have had to use an inline data source to read the JSON file, and if I try to use a dynamic expression in the container
field of the source, it will not let me start the pipeline, giving me an error that says this:
"code":"BadRequest",
"message":null,
"target":"pipeline//runid/{run guid}",
"details":null,
"error":null
When I look at the script definition of the source, it shows the container as 'containername'
if I have it hardcoded (which works) and as ($parametername)
- note autogenerated parentheses - if I have it linked to the parameter, which doesn't work. The same error occurs if I create a dataset and use a dataset parameter.
The sink dataset for the same data flow is parameterised using @dataset().clientname
, but that one works. I have determined by changing pieces of the dataflow around that the issue is in the source and specifically in the container name field.
How can I allow the pipeline to specify which container the data flow should read data from?
Here's the data flow script:
parameters{
clientname as string
}
source(output(
{@odata.context} as string,
{@odata.nextLink} as string,
value as (businessPhones as string[], displayName as string, givenName as string, id as string, jobTitle as string, mail as string, mobilePhone as string, officeLocation as string, preferredLanguage as string, surname as string, userPrincipalName as string)[]
),
useSchema: false,
allowSchemaDrift: true,
validateSchema: false,
ignoreNoFilesFound: false,
format: 'json',
container: 'containername',
folderPath: 'intune',
fileName: 'users.json',
documentForm: 'documentPerLine') ~> loadjsonusers
loadjsonusers foldDown(unroll(value),
mapColumn(
businessPhones = value.businessPhones,
displayName = value.displayName,
givenName = value.givenName,
jobTitle = value.jobTitle,
mail = value.mail,
mobilePhone = value.mobilePhone,
officeLocation = value.officeLocation,
preferredLanguage = value.preferredLanguage,
surname = value.surname,
userPrincipalName = value.userPrincipalName,
id = value.id
),
skipDuplicateMapInputs: false,
skipDuplicateMapOutputs: false) ~> flattenbyvalue
flattenbyvalue sink(allowSchemaDrift: true,
validateSchema: false,
partitionFileNames:['users-flattened.json'],
skipDuplicateMapInputs: true,
skipDuplicateMapOutputs: true,
partitionBy('hash', 1)) ~> saveflattenedusers
If I try to parameterise it, it replaces the container line with this:
container: ($clientname),
Dataflow script that you have provided is correct. The issue seems to be in the value that you are giving for the dataflow parameter. You can follow the below steps to give the value for dataflow parameter in the pipeline.
.