Extend Upgrade Controller for Windows Node Compatibility
Summary Adds support for Windows nodes by enabling the controller to detect when a node is running Windows and use a Windows-compatible kubectl image and appropriate security context settings.
Key Changes Introduces a new environment variable SYSTEM_UPGRADE_JOB_KUBECTL_IMAGE_WINDOWS for specifying the windows kubectl image to use when performing upgrades targeting Windows nodes.
In the job creation logic (in job.go):
- Detects whether the node OS label (kubernetes.io/os) equals “windows” and switches to the Windows image accordingly.
- For Windows nodes, sets SecurityContext.WindowsOptions.HostProcess = true and RunAsUserName = "NT AUTHORITY\SYSTEM" on init-containers (prepare, drain, cordon) and the main upgrade container path.
- Updates the default manifest (manifests/system-upgrade-controller.yaml) to include the new variable comment and default value for SYSTEM_UPGRADE_JOB_KUBECTL_IMAGE_WINDOWS.
Why this is needed Up to now the upgrade controller assumed Linux nodes and used a Linux kubectl image and Linux-style security context. Windows nodes have different requirements (e.g., HostProcess containers and Windows security context) which this change enables. Supporting Windows nodes helps extend the controller to heterogeneous clusters.
Impact & Considerations
- Operators upgrading Windows worker or master nodes will need to set the SYSTEM_UPGRADE_JOB_KUBECTL_IMAGE_WINDOWS env var to a Windows-e.g., tagged kubectl image.
- The logic enforces a fatal error if the env var is not set when targeting a Windows node.
- Security contexts differ: Windows containers require HostProcess=true and run under NT AUTHORITY\SYSTEM, so this may affect privileges and cluster security; review accordingly.
- Existing Linux-only flows remain unchanged, so backward compatibility is maintained.
Testing & Validation
- Verified that for a node labelled kubernetes.io/os=windows, the Windows image is picked and the pod spec has the Windows security settings.
- Verified Linux nodes continue to use the original image and security context path.
- Verified that the upgrade workflow successfully triggers on both Linux and Windows nodes.