Generated job name is invalid
Version v0.15.0-rc2
Platform/Architecture Linux talos-test04 6.12.11-talos #1 (closed) SMP Tue Jan 28 09:32:23 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Describe the bug Jobs generated have invalid names, as they end with a "-":
$ kubectl logs -n kube-system system-upgrade-8445f958db-knnpd
[...]
I0215 06:38:31.911115 1 event.go:389] "Event occurred" object="kube-system/talos" fieldPath="" kind="Plan" apiVersion="upgrade.cattle.io/v1" type="Normal" reason="SyncJob" message="Jobs synced for version v1.9.4 on Nodes talos-test02. Hash: "
time="2025-02-15T06:38:31Z" level=error msg="error syncing 'kube-system/talos': handler system-upgrade: secrets \"system-upgrade\" not found, handler system-upgrade: failed to create kube-system/apply-talos-on-talos-test02-with- batch/v1, Kind=Job for system-upgrade kube-system/talos: Job.batch \"apply-talos-on-talos-test02-with-\" is invalid: [metadata.name: Invalid value: \"apply-talos-on-talos-test02-with-\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), spec.template.labels: Invalid value: \"apply-talos-on-talos-test02-with-\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')], requeuing"
To Reproduce
- deploy v0.15.0-rc2
- add plan
Deployment-YAML:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
meta.helm.sh/release-name: system-upgrade
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2025-02-14T12:19:07Z"
generation: 3
labels:
app.kubernetes.io/component: system-upgrade
app.kubernetes.io/instance: system-upgrade
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: system-upgrade
helm.sh/chart: app-template-3.7.1
helm.toolkit.fluxcd.io/name: system-upgrade
helm.toolkit.fluxcd.io/namespace: kube-system
name: system-upgrade
namespace: kube-system
resourceVersion: "146247497"
uid: f406b0b7-74ab-4429-bd6a-5af8b1e2581a
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 3
selector:
matchLabels:
app.kubernetes.io/component: system-upgrade
app.kubernetes.io/instance: system-upgrade
app.kubernetes.io/name: system-upgrade
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
checksum/secrets: f9a2edb516d89dc9e0af00dcf3d13ae57cbe1bc631c4b35d393a497ef218d929
creationTimestamp: null
labels:
app.kubernetes.io/component: system-upgrade
app.kubernetes.io/instance: system-upgrade
app.kubernetes.io/name: system-upgrade
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: Exists
automountServiceAccountToken: true
containers:
- env:
- name: SYSTEM_UPGRADE_CONTROLLER_LEADER_ELECT
value: "true"
- name: SYSTEM_UPGRADE_CONTROLLER_NAME
value: system-upgrade
- name: SYSTEM_UPGRADE_CONTROLLER_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: SYSTEM_UPGRADE_CONTROLLER_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: SYSTEM_UPGRADE_JOB_BACKOFF_LIMIT
value: "99"
- name: SYSTEM_UPGRADE_JOB_PRIVILEGED
value: "false"
image: docker.io/rancher/system-upgrade-controller:v0.15.0-rc2
imagePullPolicy: IfNotPresent
name: app
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
enableServiceLinks: false
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
runAsGroup: 1000
runAsNonRoot: true
runAsUser: 1000
serviceAccount: system-upgrade
serviceAccountName: system-upgrade
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
Plan-YAML:
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
creationTimestamp: "2025-02-14T12:24:10Z"
generation: 3
labels:
app.kubernetes.io/name: system-upgrade-plans
kustomize.toolkit.fluxcd.io/name: system-upgrade-plans
kustomize.toolkit.fluxcd.io/namespace: flux-system
name: talos
namespace: kube-system
resourceVersion: "146247587"
uid: f02a54af-4644-4ce2-ab9f-e9a8f128e703
spec:
concurrency: 1
exclusive: true
nodeSelector:
matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
postCompleteDelay: 2m
secrets:
- ignoreUpdates: true
name: system-upgrade
path: /var/run/secrets/talos.dev
serviceAccountName: system-upgrade
upgrade:
args:
- --node=$(SYSTEM_UPGRADE_NODE_NAME)
- --tag=$(SYSTEM_UPGRADE_PLAN_LATEST_VERSION)
- --powercycle
image: ghcr.io/jfroy/tnu:0.4.0
version: v1.9.4
Full deployment: https://github.com/tuxpeople/k8s-homelab/tree/97e7256808cd65c0d004d4e58adbfd38e8f5984f/kubernetes/apps/kube-system/system-upgrade
Expected behavior Jobs to created with valid names
Actual behavior Controller fails to create jobs
Additional context I'm not a programmer, but I digged a bit and I think the name gets created here: https://github.com/rancher/system-upgrade-controller/blob/98381a657c80b9395c141f1c745f257d9a7826c2/pkg/upgrade/job/job.go#L179
Therefore, I think plan.Status.LatestHash is empty. I assume it's coming from here:
https://github.com/rancher/system-upgrade-controller/blob/98381a657c80b9395c141f1c745f257d9a7826c2/pkg/apis/upgrade.cattle.io/v1/types.go#L65
But if I do a kubectl get plan, the plan does not have a status at all.
Also, the event in the logs also not showing any hash (Hash: ")