Skip to content

CFP: removing most hostAliases usage in ClusterMesh

Cilium Feature Proposal

Is your proposed feature related to a problem?

The usage of hostAliases (https://kubernetes.io/docs/tasks/network/customize-hosts-file-for-pods/) in ClusterMesh allows to provide ${clusterName}.mesh.cilium.io domains which is especially useful when users provide IPs directly (no domains) in the cluster address.

However, this makes various components of Cilium restart; if KVStoreMesh is unused, it even restarts all agents. This is quite unfortunate because we support reloading cluster config without any restart.

Describe the feature you'd like

I think that the natural evolution to remove hostAliases would be to provide a headless Kubernetes Service, for instance remote-${clusterName} with EndpointSlice created within Helm.

We should be able to retire the specific clustermesh domain and instead have this domain in the server certificate: remote-${localClusterName}.${ciliumNamespace}.svc (note that we don't specify cluster.local so that it works if the user changes this in their clusters).

With KVStoreMesh enabled we should have no bootstrap problem.

Without KVStoreMesh however the agent should be able to resolve directly the etcd IPs and this creates a dependency loop with CoreDNS.

This PR solved some similar issue but with KVStoreMesh in mind: !40786 (merged). However here we might want to let the etcd client load balance to the different IPs provided instead of picking one IP (picking one IP is crucial for clustermesh-apiserver HA but it is not necessarily true on remote kvstore/etcd).

The simplest to solve this might be to still use hostAliases (with the remote-${localClusterName}.${ciliumNamespace}.svc entry which is weird but should work) as the config combination (not using a domain and not using kvstoremesh) should make this configuration unlikely. Alternatively we could also have some service resolver which could be the existing ServiceBackendResolver or better (but "more complex") with a grpc resolver that can resolve to multiple addresses (here is the default etcd one for instance: https://github.com/cilium/cilium/blob/main/vendor/go.etcd.io/etcd/client/v3/internal/resolver/resolver.go).

To release this change we should be able to add by default the remote-${localClusterName}.${ciliumNamespace}.svc domain in the server cert in minor release X and in minor release X+1 actually add and use the Service while announcing the breaking change for users that do not rely on our helm chart automatic certificate generation.

The major problem with all of this is that ArgoCD added some "resource exclusion" recently which includes by default Endpoint and EndpointSlice for performance reasons. IIUC this essentially means that we would be incompatible with ArgoCD by default and the only workaround is that users un-exclude Endpoint/EndpointSlice globally for their entire ArgoCD (which essentially means overriding the parameter in ArgoCD with all the default minus Endpoint/EndpointSlice). There is an issue on ArgoCD about adding more granularity to this and quite a few +1 on a comment describing exactly this situation of allowing to create Endpoint/EndpointSlice: https://github.com/argoproj/argo-cd/issues/16196#issuecomment-2847033190.

Unfortunately I think this makes it a blocker to move forward because we would have to either:

  • Degrade install experience of ArgoCD users unless they remove the EndpointSlice exclusion
  • Support both old hostAliases way and Service which prevent us from transitioning cleanly :(
  • Other workaround such as having some DNS config in a configmap and inject it in resolvers, but this sounds a bit hacky :(

One acceptable workaround though could be to have an "emulate service mode" where we emulate the Service object with hostAliases by not creating this Service and adding hostAliases entries with remote-${clusterName}.${ciliumNamespace}.svc. This would essentially revert to the previous behavior but at least most non-ArgoCD users would be able to have no hostAliases (and the associated restart) and if ArgoCD users removed the EndpointSlice exclusion they could not set this mode (and if ArgoCD improves or facilitates this we could remove this mode in the future too!).

Notify relevant community channels

Notify the members of any relevant code owners below from the [teams] list in the following form:

  • @cilium/sig-clustermesh