One of our main activities at Claranet is providing Cloud Managed Services, as well on our private cloud as for Public Cloud Providers like AWS or Azure. For the latest, we are a recognized MSP and considered a leader by the Gartner Magic Quadrant. As part of this role we are responsible for lots of IaaS workloads on Azure that we maintain and operate for our customers. An important part of these workloads are “classical” VMs (meaning they’re not included in a Scale Set) that were moved or built on Azure. With these VMs comes the mandatory galaxy of tooling for backup, monitoring, update management, configuration management, log management, etc. For some of them, changing from a legacy service to a cloud provider service is quite easy and seems only logical. For example, using Azure Backup as a backup and restore solution can be an obvious decision.
Update management is a sensitive matter and switching from an established and mastered solution to a Cloud provider service can be a little bit scary. Will we be able to control it as well as we used to, to quickly push a critical security update, to avoid deploying a patch that makes our application crash ? These are legitimate questions, let’s find out if Azure services got some answers.
Why should I even look at this ?
The first thing that comes in mind when choosing an Update Management solution is to use the tools that are already deployed in the company, like WSUS, SCCM or Red Hat Satellite. This can prove to be an efficient strategy as Azure VMs would be integrated in a well-known and (in most cases) well-managed process. The IT team therefore manages updates on Azure VM as they do on other plateforms (i.e. on-premises).
There are a few things to consider with that approach :
- Azure VMs must connect to on-premises update servers / repositories, outbounding from Azure to report to them and inbounding to Azure to download updates through a VPN or Express Route connection.
- These solutions rarely provides update scheduling on target VM, so update configurations on VMs are deployed with another tool (AD GPO, Puppet, …), schedules or executions managed by another (like Ansible) and the IT team must maintain these linked tools as well.
- Azure VMs don’t leverage the cloud services built to lighten the redundant administrative tasks
Azure provides a set of solutions to ease update management with a slightly different mindset. They are intended to reduce the administrative effort of the IT team by providing an automated service with lighter functionnalities. We must keep in mind that those services are built to make companies focus on their main job, and therefore losing a bit of precision over low value technical actions. Keeping it simple : IT team delegates Update Management to Azure and gain time to focus on more complex tasks.
To consider a world without a company-managed update server / repository, one question worth asking is how many faulty updates were encountered and what were the impacts of them ? If the ratio is quite low, then the Azure solutions are worth a try — with an adaptation of the update process if needed, always in order to reduce risks of deploying a faulty update on a production VM.
Two services can help companies to deploy updates, each with a different level of control. Azure Update Management allows to create schedules to define the when (time and recurrence), the who (target VMs) and the which (updates). Azure Automatic Updates by Platform handles the when and the which, leaving you to decide the VMs that should benefit from it. Let’s dig a bit on those services.
Note : these services only deploy updates for the Operating System on Windows, not for middlewares nor applications. On Linux all updates handled in the package manager (like yum or apt) can be included.
Azure Update Management (Automation)
The main feature for managing Updates is Azure Update Management, a free solution based on Azure Automation and depending on a Log Analytics Workspace. The behavior is simple : it can be seen as a scheduler sending order to the VM to update themselves with a few constraints (like update classifications). Wen the command is received by the VM, its update service uses the VM local configuration to download and install the updates. More details are available on the Microsoft documentation : https://docs.microsoft.com/en-us/azure/automation/update-management/manage-updates-for-vm.
That means that if a WSUS is configured as an update server, or a specific Linux update repository is declared, the Azure VM will still interact with it. Using a custom update server or repo on top of Azure Update Management is a point to mitigate : is it better to keep control and reporting through an update server, with the drawbacks stated earlier, or to lightly loosen control over updates by using a public update source and “disconnect” from the legacy tooling to minimize human interaction? Another feature of the service is the ability to target non-Azure VMs. This solution can therefore be propagated on VMs deployed on-premises or on another Public Cloud, becoming a central management point for updates.
Deploying the solution is easy and can be done through the Azure Portal or with an Infra as Code tool like Terraform (the AzureRM provider does not support an explicit resource for it yet but allows a template deployment as a workaround).
- A Log Analytics Workspace
- An Automation Account in the same region
- Automation Account and Log Analytics Worskapce linked together (can be done during the deployment phase)
- Target VMs connected to the Log Analytics Workspace
The service may be unavailable in some Azure regions, as the feature to link a Log Analytics Workspace and a Automation isn’t available everywhere yet.
Deployment — Azure portal
Deployment steps :
- On the Automation Account, go to the Update Management part and click on Enable (you can link the Automation Account to the Log Analytics Workspace at this step)
- Add VMs with the dedicated button on top or enable it for all VMs connected to the workspace, including the future ones, by clicking Manage machines and select Enable on all available and future machines.
After a few minutes, VMs will appear and share their update status. Any missing updates will be exposed here. After they report to this solution, VMs can be included in a schedule as targets for patching.
For each group of VM, a deployment schedule is to be created with corresponding targets (prefer using dynamic grouping to add all VM in a specific resource group or with a common tag) and configuration (time / classifications / exclusions / reboot behavior). The portal configuration of a schedule looks like the following:
When using Groups, you define the criterias that Azure will use to select VMs. There’s even a Preview button to ensure that targeting is configured accurately.
Deployment — Terraform sample
Although there is no dedicated Terraform resource to use to deploy an Update Management solution, it can be achieved with the azurerm_template_deploymentresource to include ARM in Terraform. Considering that the prerequisites are in place, the service is enabled as a Log Analytics Solution and schedules are deployed. This is only a quite simple sample and code can obviously be improved. We use our own modules to deploy other resources, as stated in this article, a complete list of Claranet’s public modules is maintained here.
So, first, enable the solution with the following:
Then declare and configure each schedule :
For urgent security updates, a workaround is to create a dedicated schedule with a one-time execution date and deploying only security updates. When a security update must be deployed quickly, changing the schedule time and date execution is the only action required. As always, if applicable, target non-production VMs first to ensure that the updates do not trouble the application.
At this point we must keep in mind that even if Azure helps us scheduling updates, it does not choose which VM should be patched first. Multiple schedules should be deployed at different times to deploy updates on non-production VMs in order to test them before they’re deployed on production or to update VM hosting the same application at different times to maintain high availability. Every update solution would include at least 2 schedules, one for non-production — run just after the MS tuesday patch for example — and one or production one or two weeks later to detect any error or misbehavior caused by a faulty patch.
The way Update Management works also makes blue / green update deployments easy. If an application is hosted on multiple VMs, updates can be deployed on some of them using a “blue” tag while the VM with the “green” tag are not involved and keep serving the app. A second schedule targeting the “green” VMs occurs after a while in order to check the status of the “blue” VMs before propagating updates. This way, a workload distributed accross VMs remain available during the update phase, as it does in all blue / green processes.
Reporting and logs
After execution, a report is available with details. Logs can also be parsed in the Log Analytics Workspace, stating that Diagnostics Settings have been setup on the Automation Account.
The logs can be used to expose results on an Azure dashboard or as a source event to shutdown monitoring while the VM is updating and most likely rebooting to avoid any on-call alert for the IT Team. Here is one of the most simple examples of their usage for a Azure Dashboard: displaying the success / failure ratio. Run this request and pin the result to a Dashboard to get a quick overview of the update status for a project, a region or even in an overall perspective.
The request :
Alerts can also be setup to send an email with report of failed or successful updates : https://docs.microsoft.com/en-us/azure/automation/update-management/configure-alerts
NB : Azure Update Management is also included in the Azure Automanage feature. Still in preview, it enables a full set of Azure services on a VM (Backup, Insights monitoring, Security Center, Antimalware, Update Management, Change Tracking & Inventory, Log Analytics) with standardized — and not editable — configuration. More details here : https://docs.microsoft.com/fr-fr/azure/automanage/automanage-virtual-machines
Azure automatic updates (for Windows and Linux, in preview)
This feature is still in preview and offers a fully Azure-managed Update management. The key features are :
- Applies only Security and Critical updates
- Download and install updates automatically every month and reboot the VM after applying
- Updates are applied any time, when Azure determines that the VM is in an “off-peak” period.
- VMs in different Availability Zones or in an Availability Set are not updated at the same time
- VMs not part of an AS are batched on a best effort basis to avoid concurrent updates for all VMs in a subscription
- Enabled at VM creation (or later through CLI)
- This preview included only Windows Server 2012 R2 and above, now RHEL 7.x and Ubuntu 18.04 are also eligibles
More details on the Microsoft documentation: https://docs.microsoft.com/fr-fr/azure/virtual-machines/windows/automatic-vm-guest-patching
This means that Critical and Security patches are installed without any administrative control of the company. Azure “decides” when to apply them and reboots the VM if needed. For other classifications, Azure Patch Management or a legacy solution is recommended to complete the OS updates scope.
This new feature seems great for non-production VMs without specific scheduled tasks or dependency with other VMs, as they may become unavailable without notice. A perfect candidate would be a POC VM, where security must be up-to-date but where the non-critical updates would not be missed.
The Azure Activity Log on a VM with Automatic Update by Platform enabled shows the patch actions :
Azure provides these services for free. Azure Update Management should be considered as a potential solution as progressive replacement for existing legacy tooling, in order to ease patch management as it can be configured to automatically onboard new VMs and use existing schedules to keep them up-to-date. Automatic Updates by Platform can be a key to put patch management back as a “back office” role for non critical workloads. Base reporting isn’t as deep as the one provided by update servers like WSUS, update selection or exclusion is quite heavy (for example Microsoft KBs must be declared one by one in the schedule), but the day-to-day benefit for the IT team is worth the try.