Skip to content

Generated job name is invalid

Version v0.15.0-rc2

Platform/Architecture Linux talos-test04 6.12.11-talos #1 (closed) SMP Tue Jan 28 09:32:23 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug Jobs generated have invalid names, as they end with a "-":

$ kubectl logs -n kube-system system-upgrade-8445f958db-knnpd
[...]
I0215 06:38:31.911115       1 event.go:389] "Event occurred" object="kube-system/talos" fieldPath="" kind="Plan" apiVersion="upgrade.cattle.io/v1" type="Normal" reason="SyncJob" message="Jobs synced for version v1.9.4 on Nodes talos-test02. Hash: "
time="2025-02-15T06:38:31Z" level=error msg="error syncing 'kube-system/talos': handler system-upgrade: secrets \"system-upgrade\" not found, handler system-upgrade: failed to create kube-system/apply-talos-on-talos-test02-with- batch/v1, Kind=Job for system-upgrade kube-system/talos: Job.batch \"apply-talos-on-talos-test02-with-\" is invalid: [metadata.name: Invalid value: \"apply-talos-on-talos-test02-with-\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), spec.template.labels: Invalid value: \"apply-talos-on-talos-test02-with-\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')], requeuing"

To Reproduce

  1. deploy v0.15.0-rc2
  2. add plan

Deployment-YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "3"
    meta.helm.sh/release-name: system-upgrade
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2025-02-14T12:19:07Z"
  generation: 3
  labels:
    app.kubernetes.io/component: system-upgrade
    app.kubernetes.io/instance: system-upgrade
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: system-upgrade
    helm.sh/chart: app-template-3.7.1
    helm.toolkit.fluxcd.io/name: system-upgrade
    helm.toolkit.fluxcd.io/namespace: kube-system
  name: system-upgrade
  namespace: kube-system
  resourceVersion: "146247497"
  uid: f406b0b7-74ab-4429-bd6a-5af8b1e2581a
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app.kubernetes.io/component: system-upgrade
      app.kubernetes.io/instance: system-upgrade
      app.kubernetes.io/name: system-upgrade
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        checksum/secrets: f9a2edb516d89dc9e0af00dcf3d13ae57cbe1bc631c4b35d393a497ef218d929
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: system-upgrade
        app.kubernetes.io/instance: system-upgrade
        app.kubernetes.io/name: system-upgrade
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-role.kubernetes.io/control-plane
                operator: Exists
      automountServiceAccountToken: true
      containers:
      - env:
        - name: SYSTEM_UPGRADE_CONTROLLER_LEADER_ELECT
          value: "true"
        - name: SYSTEM_UPGRADE_CONTROLLER_NAME
          value: system-upgrade
        - name: SYSTEM_UPGRADE_CONTROLLER_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: SYSTEM_UPGRADE_CONTROLLER_NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: SYSTEM_UPGRADE_JOB_BACKOFF_LIMIT
          value: "99"
        - name: SYSTEM_UPGRADE_JOB_PRIVILEGED
          value: "false"
        image: docker.io/rancher/system-upgrade-controller:v0.15.0-rc2
        imagePullPolicy: IfNotPresent
        name: app
        resources: {}
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      enableServiceLinks: false
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsGroup: 1000
        runAsNonRoot: true
        runAsUser: 1000
      serviceAccount: system-upgrade
      serviceAccountName: system-upgrade
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
        operator: Exists

Plan-YAML:

apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  creationTimestamp: "2025-02-14T12:24:10Z"
  generation: 3
  labels:
    app.kubernetes.io/name: system-upgrade-plans
    kustomize.toolkit.fluxcd.io/name: system-upgrade-plans
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  name: talos
  namespace: kube-system
  resourceVersion: "146247587"
  uid: f02a54af-4644-4ce2-ab9f-e9a8f128e703
spec:
  concurrency: 1
  exclusive: true
  nodeSelector:
    matchExpressions:
    - key: kubernetes.io/os
      operator: In
      values:
      - linux
  postCompleteDelay: 2m
  secrets:
  - ignoreUpdates: true
    name: system-upgrade
    path: /var/run/secrets/talos.dev
  serviceAccountName: system-upgrade
  upgrade:
    args:
    - --node=$(SYSTEM_UPGRADE_NODE_NAME)
    - --tag=$(SYSTEM_UPGRADE_PLAN_LATEST_VERSION)
    - --powercycle
    image: ghcr.io/jfroy/tnu:0.4.0
  version: v1.9.4

Full deployment: https://github.com/tuxpeople/k8s-homelab/tree/97e7256808cd65c0d004d4e58adbfd38e8f5984f/kubernetes/apps/kube-system/system-upgrade

Expected behavior Jobs to created with valid names

Actual behavior Controller fails to create jobs

Additional context I'm not a programmer, but I digged a bit and I think the name gets created here: https://github.com/rancher/system-upgrade-controller/blob/98381a657c80b9395c141f1c745f257d9a7826c2/pkg/upgrade/job/job.go#L179

Therefore, I think plan.Status.LatestHash is empty. I assume it's coming from here: https://github.com/rancher/system-upgrade-controller/blob/98381a657c80b9395c141f1c745f257d9a7826c2/pkg/apis/upgrade.cattle.io/v1/types.go#L65

But if I do a kubectl get plan, the plan does not have a status at all.

Also, the event in the logs also not showing any hash (Hash: ")