I'm experiencing difficulties regarding accessing external resources from nodes (RHEL) configured in a VM Scale Set.
To sketch the environment I'm trying to describe using Azure Resource Manager Templates, I'm looking to create:
As a footnote, I'm comfortable writing up AWS cloudformation files in YAML, so I'm handling Azure Resource Manager Templates in a similar way, for the sake of readability and the added functionality of adding comments in my template.
An Example of my vmss config (short snippet)
... #(yaml-template is first converted to json and than deployed using the azure cli)
# Cluster
# -------
# Scale Set
# ---------
# | VM Scale Set can not connect to external sources
# |
- type: Microsoft.Compute/virtualMachineScaleSets
name: '[variables(''vmssName'')]'
location: '[resourceGroup().location]'
apiVersion: '2017-12-01'
dependsOn:
- '[variables(''vnetName'')]'
- '[variables(''loadBalancerName'')]'
- '[variables(''networkSecurityGroupName'')]'
sku:
capacity: '[variables(''instanceCount'')]' # Amount of nodes to be spawned
name: Standard_A2_v2
tier: Standard
# zones: # If zone is specified, no sku can be chosen
# - '1'
properties:
overprovision: 'true'
upgradePolicy:
mode: Manual
virtualMachineProfile:
networkProfile:
networkInterfaceConfigurations:
- name: '[variables(''vmssNicName'')]'
properties:
ipConfigurations:
- name: '[variables(''ipConfigName'')]'
properties:
loadBalancerBackendAddressPools:
- id: '[variables(''lbBackendAddressPoolsId'')]'
loadBalancerInboundNatPools:
- id: '[variables(''lbInboundNatPoolsId'')]'
subnet:
id: '[variables(''subnetId'')]'
primary: true
networkSecurityGroup:
id: '[variables(''networkSecurityGroupId'')]'
osProfile:
computerNamePrefix: '[variables(''vmssName'')]'
adminUsername: '[parameters(''sshUserName'')]'
# adminPassword: '[parameters(''adminPassword'')]'
linuxConfiguration:
disablePasswordAuthentication: True
ssh:
publicKeys:
- keyData: '[parameters(''sshPublicKey'')]'
path: '[concat(''/home/'',parameters(''sshUserName''),''/.ssh/authorized_keys'')]'
storageProfile:
imageReference: '[variables(''clusterImageReference'')]'
osDisk:
caching: ReadWrite
createOption: FromImage
...
The Network Security Group referenced from the template above is:
# NetworkSecurityGroup
# --------------------
- type: Microsoft.Network/networkSecurityGroups
name: '[variables(''networkSecurityGroupName'')]'
apiVersion: '2017-10-01'
location: '[resourceGroup().location]'
properties:
securityRules:
- name: remoteConnection
properties:
priority: 101
access: Allow
direction: Inbound
protocol: Tcp
description: Allow SSH traffic
sourceAddressPrefix: '*'
sourcePortRange: '*'
destinationAddressPrefix: '*'
destinationPortRange: '22'
- name: allow_outbound_connections
properties:
description: This rule allows outbound connections
priority: 200
access: Allow
direction: Outbound
protocol: '*'
sourceAddressPrefix: 'VirtualNetwork'
sourcePortRange: '*'
destinationAddressPrefix: '*'
destinationPortRange: '*'
And the loadbalancer, where I assume the error should be, is described as:
# Loadbalancer as NatGateway
# --------------------------
- type: Microsoft.Network/loadBalancers
name: '[variables(''loadBalancerName'')]'
apiVersion: '2017-10-01'
location: '[resourceGroup().location]'
sku:
name: Standard
dependsOn:
- '[variables(''natIPAddressName'')]'
properties:
backendAddressPools:
- name: '[variables(''lbBackendPoolName'')]'
frontendIPConfigurations:
- name: LoadBalancerFrontEnd
properties:
publicIPAddress:
id: '[variables(''natIPAddressId'')]'
inboundNatPools:
- name: '[variables(''lbNatPoolName'')]'
properties:
backendPort: '22'
frontendIPConfiguration:
id: '[variables(''frontEndIPConfigID'')]'
frontendPortRangeStart: '50000'
frontendPortRangeEnd: '50099'
protocol: tcp
I keep reading articles about configuring a SNAT with port masquerading, but I'm missing relevant examples of such setup.
Any help is greatly appreciated.
It took a lot of searching but the article from Azure about Azure Load Balancer outbound Connections (Scenario #2) stated a load-balancing rule (and complementary Health Probe) was necessary for SNAT to function.
the new code for the load balancer became:
...
- type: Microsoft.Network/loadBalancers
name: '[variables(''loadBalancerName'')]'
apiVersion: '2017-10-01'
location: '[resourceGroup().location]'
sku:
name: Standard
dependsOn:
- '[variables(''natIPAddressName'')]'
properties:
backendAddressPools:
- name: '[variables(''lbBackendPoolName'')]'
frontendIPConfigurations:
- name: LoadBalancerFrontEnd
properties:
publicIPAddress:
id: '[variables(''natIPAddressId'')]'
probes: # Needed for loadBalancingRule to work
- name: '[variables(''lbProbeName'')]'
properties:
protocol: Tcp
port: 22
intervalInSeconds: 5
numberOfProbes: 2
loadBalancingRules: # Needed for SNAT to work
- name: '[concat(variables(''loadBalancerName''),''NatRule'')]'
properties:
disableOutboundSnat: false
frontendIPConfiguration:
id: '[variables(''frontEndIPConfigID'')]'
backendAddressPool:
id: '[variables(''lbBackendAddressPoolsId'')]'
probe:
id: '[variables(''lbProbeId'')]'
protocol: tcp
frontendPort: 80
backendPort: 80
...