Search code examples
azureazure-functionsazure-storageazure-data-lakeazure-virtual-network

Azure Data Lake: This request is not authorized to perform this operation


This question is not a duplicate of This request is not authorized to perform this operation. Azure blobClient

I want to access an Azure Data Lake from an Azure Function. I use Managed Identity and have assigned an "Owner" role for this function in my Data Lake IAM tab. Everything works but only, when I allow "All networks" in my data lake. As soon as I switch to "Selected networks", I get "This request is not authorized to perform this operation" even though I added all outbound IP addresses of my Azure Function to the firewall rules of my data lake.

I added a call to https://api.ipify.org in my Azure Function to see its public IP address. I also use socket.gethostbyname() to get the resolved IP address of the host name of the data lake (just to check if this potentially gets resolved to a private IP within Microsoft's data center but it's not, it's a public IP)

enter image description here

The outbound IP of my function also appears in my Firewall rules:

enter image description here

and still I get

Error] Executed 'Functions.json-deserialize-eventhub' (Failed, Id=10554e46-2f5e-42a7-a9f7-a95dad1b4e30, Duration=1063ms)Result: FailureException: HttpResponseError: This request is not authorized to perform this operation.RequestId:84928ea9-201e-0092-5a97-d46e78000000Time:2020-12-17T17:07:38.2792923ZErrorCode:AuthorizationFailureError:NoneStack: File "/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/dispatcher.py", line 355, in _handle__invocation_requestcall_result = await self._loop.run_in_executor(File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in runresult = self.fn(*self.args, **self.kwargs)File "/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/dispatcher.py", line 542, in __run_sync_funcreturn func(**params)File "/home/site/wwwroot/json-deserialize-eventhub/init.py", line 38, in maindlc.append_object(obj_converted, data_lake_target_url)File "/home/site/wwwroot/shared/datalake.py", line 41, in append_objectfor container in file_system_containers:File "/home/site/wwwroot/.python_packages/lib/site-packages/azure/core/paging.py", line 129, in __next__return next(self._page_iterator)File "/home/site/wwwroot/.python_packages/lib/site-packages/azure/core/paging.py", line 76, in __next__self._response = self._get_next(self.continuation_token)File "/home/site/wwwroot/.python_packages/lib/site-packages/azure/storage/blob/_models.py", line 401, in _get_next_cbprocess_storage_error(error)File "/home/site/wwwroot/.python_packages/lib/site-packages/azure/storage/blob/_shared/response_handlers.py", line 147, in process_storage_errorraise error

The code itself is simplistic, I basically just use the DataLakeServiceClient from the azure.storage.filedatalake module to communicate with the data lake.

Why does this not work? How can I allow my function to write/read data? VNet integration is not an option at the moment.


Solution

  • There is a special case when you configure storage accounts to allow access from specific public internet IP address ranges.

    Services deployed in the same region as the storage account use private Azure IP addresses for communication. Thus, you cannot restrict access to specific Azure services based on their public outbound IP address range.

    I also see there is a private IP 172.16.0.5 in your screenshots. You could create the same Function APP in another Azure function in the different region from the Azure Data Lake region to verify this.