I want to create an Azure HDI cluster that can access both ADLSg1 and ADLSg2 data lakes. Is this supported?
This is possible for Spark 2.4 (HDI 4.0) with restrictions:
core-site.xml
configurations manually either via the Ambari UI or via ssh.Steps:
For ADLS Gen 1:
fs.adl.oauth2.access.token.provider.type = ClientCredential
fs.adl.oauth2.client.id = <ADLSg1 Application ID>
fs.adl.oauth2.credential = <ADLSg1 Client Secret>
fs.adl.oauth2.refresh.url = https://login.microsoftonline.com/<Tenant ID>/oauth2/token
For ADLS Gen 2:
fs.azure.account.auth.type.<ADLSg2 storage account name>.dfs.core.windows.net = OAuth
fs.azure.account.oauth.provider.type.<ADLSg2 storage account name>.dfs.core.windows.net = org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
fs.azure.account.oauth2.client.id.<ADLSg2 storage account name>.dfs.core.windows.net = <ADLSg2 Application ID>
fs.azure.account.oauth2.client.secret.<ADLSg2 storage account name>.dfs.core.windows.net = <ADLSg1 Client Secret>
fs.azure.account.oauth2.client.endpoint.<ADLSg2 storage account name>.dfs.core.windows.net = https://login.microsoftonline.com/<Tenant ID>/oauth2/token
To access files from the cluster:
Use the fully qualified name. With this approach, you provide the full path to the file that you want to access.
adl://<data_lake_account>.azuredatalakestore.net/<cluster_root_path>/<file_path>
abfs://<containername>@<accountname>.dfs.core.windows.net/<file.path>/