As you know, Azure Virtual Machine Scale Sets let you create a group of identical virtual machine which then will scale up or down to match the request numbers or on a defined scheduled.
That said, what if for some reason the application hosted on one or more of these virtual machine within the scale set is not responding properly? Until now, you had to manually manage the issue.
Well good news, an new feature for virtual machine scale sets in now in preview to automatically delete the unhealthy instance(s).
As this is in preview, you first need to register the provide using either Rest API or PowerShell
POST on ‘/subscriptions/<your subscription ID>/providers/Microsoft.Features/providers/Microsoft.Compute/features/RepairVMScaleSetInstancesPreview/register?api-version=2015-12-01’
after few minutes you can check the status
GET on ‘/subscriptions/<your subscription ID>/providers/Microsoft.Features/providers/Microsoft.Compute/features/RepairVMScaleSetInstancesPreview?api-version=2015-12-01’
POST on ‘/subscriptions/<your subscription ID>/providers/Microsoft.Compute/register?api-version=2015-12-01’
Register-AzureRmResourceProvider -ProviderNamespace Microsoft.Compute
Register-AzureRmProviderFeature -ProviderNamespace Microsoft.Compute -FeatureName RepairVMScaleSetInstancesPreview
after few minutes you can check the status using
Get-AzureRmProviderFeature -ProviderNamespace Microsoft.Compute -FeatureName RepairVMScaleSetInstancesPreview
Then you can enable the health monitoring either with the Application Host extension (https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-health-extension) or the Load Balancer Health probe (https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-custom-probe-overview).
During the preview, the auto-repair feature can only work for scale sets deployed within a single placement group (singleplacementgroup property must be set to true).
If you want to enable the feature when creating a new scale sets, you need to use the following command – note the EnableAutomaticRepair property
New-AzVmssConfig –Location <location> –SkuCapacity <SKU capacity> –SkuName <SKU name> -UpgradePolicyMode “Automatic” -EnableAutomaticRepair $true -AutomaticRepairGracePeriod “PT30M”
You can also update your existing scale sets to have the feature enabled using the command
Update-AzVmss –ResourceGroupName <your resource group> -VMScaleSetName <your scale sets> -EnableAutomaticRepair $true -AutomaticRepairGracePeriod “PT40M”