How to Setup NVIDIA Driver on NV-Series Azure VM

I recently had the opportunity to assist on a project where a partner was using N-Series Azure VMs.  My part of this effort was developing a script to automate the setup of the VMs. To perform the VM setup and configuration, an ARM template was used.  The ARM template approach was used because doing so provided consistency with several other ARM templates being used for other parts of the project.

Setting up Azure VMs using ARM templates is common. There are many articles, blog posts, and sample templates available to help get started.  That isn’t itself especially interesting. The interesting part, at least for me, was the N-Series aspect. N-Series VMs require a separate step to install the NVIDIA driver to take advantage of the GPU capabilities of the VM.  There are instructions on how to install the driver, but those instructions assume you like to remote into the VM each time you create a VM, and then run an installation program. That’s tolerable if doing it only a few times. Any more than that, and it’s time for automation.

The v370.12 driver (which is the current version linked via the Azure documentation page) uses a self-extracting file to first extract the setup components to a directory, and then executes the setup program.  By scouring a few other blogs on performing a silent install of NVIDIA drivers, I could piece together the necessary switches to provide to the installation program to perform a silent install.

> 370.12_grid_win8_win7_server2012R2_server2008R2_64bit_international.exe -s -noreboot -clean

This tells the installation program to install silently, to not perform a reboot after the installation is complete, and to perform a clean install (restores all NVIDIA settings to the default values).

Now I need to work that it into a PowerShell script to execute via a custom script extension. By doing so, I can let ARM do its thing by provisioning the VM and related resources (NIC, Virtual Network, IP address, etc.), and then invoke a PowerShell script to install the NVIDIA driver.

The custom script extension will execute a few different steps:

  1. Download the NVIDIA driver setup file from Azure Blob storage. I put the setup file in blob storage to make sure that this specific one is the one to be used.
  2. Download a PowerShell script which will execute the NVIDIA driver setup program with parameters to do so silently.
  3. Wait for the installation program to finish
  4. Force a reboot of the VM

It should be noted that the driver installation and GPU detection can take a couple of minutes.

As you can see in the following snippets, the custom script extension and related PowerShell script are fairly trivial.

ARM Template Custom Script Extension

{
      "type": "extensions",
      "name": "CustomScriptExtension",
      "apiVersion": "2015-06-15",
      "location": "[resourceGroup().location]",
      "dependsOn": [
      	"[variables('vmName')]"
],
"properties": {
      	"publisher": "Microsoft.Compute",
      "type": "CustomScriptExtension",
      	"typeHandlerVersion": "1.8",
      "autoUpgradeMinorVersion": true,
      	"settings": {
            	"fileUris": [
                  	"[concat(variables('assetStorageUrl'), variables('scriptFileName'))]",
                        "[concat(variables('assetStorageUrl'), variables('nvidiaDriverSetupName'))]"
]
},
            "protectedSettings": {
            	"commandToExecute": "[concat('powershell -ExecutionPolicy Unrestricted -File ', variables('scriptFileName'), ' ', variables('scriptParameters'))]",
                  "storageAccountName": "[parameters('assetStorageAccountName')]",
                  "storageAccountKey": "[listKeys(concat('Microsoft.Storage/storageAccounts/', parameters('assetStorageAccountName')), '2015-06-15').key1]"
      	}
}
}

PowerShell script executed by the Custom Script Extension

<# Custom Script for Windows to install a file from Azure Storage #>
param(
    [string] $nvidiaDriverSetupPath
)

# ----- Silent install of NVidia driver -----
& ".\$nvidiaDriverSetupPath" -s -noreboot -clean

# ----- Sleep to allow the setup program to finish. -----
Start-Sleep -Seconds 120

# ----- NVidia driver installation requires a reboot. -----
Restart-Computer -Force

In this scenario, I also need to get the assets used by the custom script extension – the NVIDIA driver setup file and PowerShell script (which will execute the NVIDIA driver setup file) – uploaded to Azure Blob storage.  That can easily be accomplished with the same PowerShell script used to deploy the ARM template.  That script will perform the following tasks:

  1. Create a new resource group
  2. Create a new storage account and container
  3. Upload the NVIDIA driver setup file and related PowerShell script to the newly created storage account
  4. Execute the ARM template

You can find the full ARM template, custom script, and deployment script on my GitHub project which accompanies this post.

In order to verify it all worked, I can RDP into the VM and verify the driver installation.

This slideshow requires JavaScript.

What about unsigned drivers?

An earlier version of the NVIDIA driver, v369.95, was not digitally signed.  It was also provided as a ZIP file instead of an EXE (like v370.12).   To use this version of the NVIDIA driver, a few changes to the setup script are necessary. First, the file contents need to be extracted/unzipped.  That’s doable via some PowerShell in the script executed via the custom script extension.  Getting around the lack of a digitally signed driver is a bit more . . . interesting. If you were to install the driver manually, you would receive a prompt from Windows asking you to confirm that installing the driver is REALLY what is desired.

NVIDIA-security-prompt.png

Completing the manual installation will result in a certificate installed to the VM’s Trusted Publisher certificate store.  The certificate can then be exported and saved to Azure Blob storage.

cert-manager.png

I can use that certificate as part of the automated install process. By using the certutil.exe program it is possible to install the certificate into the Trusted Publisher store on a new VM.  This step can be included in the PowerShell script executed via the custom script extension.

An example of this approach can be found at https://github.com/mcollier/setup-nvidia-drivers-azure-vm/tree/driver-369.95.

Alternative Approach

An alternative approach is to create a custom VM image with the necessary NVIDIA driver already installed.  The advantage with this approach is you don’t have to go through the custom script step. However, any new VM deployed from such an image will still need to go through a reboot after GPU detection following the first startup. You can also add additional software or configuration as needed.  The disadvantage is you’re then accepting responsibility for keeping the VM patched on a regular basis. If you use an image provided by Microsoft, those images are patched on a regular (often at least once per month) basis.

Resources

Here are some resources which helped me in coming up with the solution presented above.