Search code examples
azure-devopsterraformazure-sql-serverazure-pipelines-yamlinvoke-sqlcmd

Can an Azure SQL server be unavailable / unreachable for a period of time after creation?


I am creating an Azure SQL server & db via a terraform task within an Azure Devops Pipeline using hosted agents.

The resources create without issue.

However, the next job in my stage tries to run a db_init.sql script against the newly created db using the Azure SQL Database Deployment task. This task will fail intermittently with the following error:

Invoke-Sqlcmd -ServerInstance "[redacted].database.windows.net" -Database "[redacted]" -Username "[redacted]" -Password ****** -Inputfile "C:\Users\VssAdministrator\AppData\Local\Temp\tmp6F48.tmp" -ConnectionTimeout 120

##[error]System.Management.Automation.RuntimeException: No resource found with serverName [redacted], serverType Microsoft.Sql/servers in subscription [redacted]. Specify the correct serverName/serverType and try again.Check out how to troubleshoot failures at https://aka.ms/sqlazuredeployreadme#troubleshooting-

The troubleshooting link above has not been useful. I have also found this question on SO but the answer there is not applicable.

The pipeline setup is as follows:


stages:
- stage: rg
  displayName: Create Resource Group
  ...

- stage: db
  dependsOn: 
    - rg
  displayName: Create DB Stage

  jobs:
    - job: createDB
      displayName: 'Create Database'

      steps:
        
         ...

       - task: TerraformTaskV1@0
         name: terraformApply
         displayName: Terraform Apply
         inputs: ...

       - task: PowerShell@2
         displayName: Read terraform outputs
         name: terraformOutput
         inputs:
           pwsh: true
           targetType: 'inline'
           script: |
               $terraformOutput = Get-Content "$(terraformApply.jsonOutputVariablesPath)" | ConvertFrom-Json
               $terraformOutput | Get-Member -MemberType NoteProperty | % { $o = $terraformOutput.($_.Name); Write-Host "##vso[task.setvariable variable=$($_.Name);isoutput=true;issecret=$($o.sensitive)]$($o.value)" }
           failOnStderr: true
           showWarnings: true

    - job:
      displayName: Initialise DB
      dependsOn: createDB
      variables:
        serverFQDN: $[ dependencies.createDB.outputs['terraformOutput.sql_server_fqdn'] ] 
        dbName: $[ dependencies.createDB.outputs['terraformOutput.sql_db_name'] ] 
        serverName: $[ dependencies.createDB.outputs['terraformOutput.server_name'] ]
      pool:
        vmImage: 'windows-latest'
      steps:
        - task: SqlAzureDacpacDeployment@1
          displayName: Initialise DB
          inputs:
            azureSubscription: '[azure subscription id]'
            AuthenticationType: 'server'
            ServerName: '$(serverFQDN)'
            DatabaseName: '$(dbName)'
            SqlUsername: '[sql admin username]'
            SqlPassword: '[sql admin password]'
            deployType: 'SqlTask'
            SqlFile: '$(System.DefaultWorkingDirectory)/[path-to]/db_init.sql'
            IpDetectionMethod: 'AutoDetect'


The issue is infrequent and the pipeline often succeeds after retrying the failed step. However, I have occasionally seen the 'Initialise DB' step fail 3 or 4 retries in a row over a period of several minutes.

I have also tested using an Azure CLI task to wait for the db server (testing both --created & --exists flags) before attempting to initialise but this does not seem to help.

- task: AzureCLI@2
  displayName: 'Wait for DB to be ready'
  inputs:
     azureSubscription: '$(azure-subscription-id)'
     scriptType: bash
     scriptLocation: inlineScript
     inlineScript: |
         # wait up to 5 minutes for the db server to be available. check every 10 seconds
         az sql server wait --exists --interval 10 --timeout 300 --resource-group [rg name] --name $(serverName)
     failOnStderr: true

EDIT 2021-03-05: On deeper investigation this issue seems to be caused by a problem with an Azure API call used by the pipeline task. In order to set the firewall rule, the task attempts to retrieve the id of the target Azure Sql server. To do so, it calls GET https://management.azure.com//subscriptions/[subscription id]/resources?api-version=2014-04-01 and loops through the response to find a Sql server resource with a matching name. On some occasions I have seen this action fail >30 mins after the db server has been created, while I have been able to query it within the Azure portal.


Solution

  • We are glad to see you have found the root cause! We could learn from you new edit:

    EDIT 2021-03-05:

    • On deeper investigation this issue seems to be caused by a problem with an Azure API call used by the pipeline task. In order to set the firewall rule, the task attempts to retrieve the id of the target Azure Sql server. To do so, it calls GET https://management.azure.com//subscriptions/[subscription id]/resources?api-version=2014-04-01 and loops through the response to find a Sql server resource with a matching name. On some occasions I have seen this action fail >30 mins after the db server has been created, while I have been able to query it within the Azure portal.

    I Help you post it as answer and this can be beneficial to other community members. Thank you for you sharing!