Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gardenlet #1601

Merged
merged 1 commit into from Dec 10, 2019
Merged

Gardenlet #1601

merged 1 commit into from Dec 10, 2019

Conversation

@rfranzke
Copy link
Member

rfranzke commented Nov 7, 2019

What this PR does / why we need it:
Right from the beginning of the Gardener Project we started implementing the operator pattern: We have a custom controller-manager that acts on our own custom resources. Now, when you start thinking about the Gardener architecture, you will recognize some interesting similarity with respect to the Kubernetes architecture: Shoot clusters can be compared with pods, and seed clusters can be seen as worker nodes. Guided by this observation we introduced the gardener-scheduler. Its main task is to find an appropriate seed cluster to host the control-plane for newly ordered clusters, similar to how the kube-scheduler finds an appropriate node for newly created pods. By providing multiple seed clusters for a region (or provider) and distributing the workload, we reduce the blast-radius of potential hick-ups as well.

gardenlet2

Yet, there is still a significant difference between the Kubernetes and the Gardener architectures: Kubernetes runs a primary "agent" on every node, the kubelet, which is mainly responsible for managing pods and containers on its particular node. Gardener uses its controller-manager which is responsible for all shoot clusters on all seed clusters, and it is performing its reconciliation loops centrally from the garden cluster.

While this works well at scale for thousands of clusters today, our goal is to enable true scalability following the Kubernetes principles (beyond the capacity of a single controller-manager): We are now working on distributing the logic (or the Gardener operator) into the seed cluster and will introduce a corresponding component, adequately named the gardenlet. It will be Gardener's primary "agent" on every seed cluster and will be only responsible for shoot clusters located in its particular seed cluster.

The gardener-controller-manager will still keep its control loops for other resources of the Gardener API, however, it will no longer talk to seed/shoot clusters.

Reversing the control flow will even allow placing seed/shoot clusters behind firewalls without the necessity of direct accessibility (via VPN tunnels) anymore.

Which issue(s) this PR fixes:
Fixes #1576
Fixes #1592

Special notes for your reviewer:

  • Filtering for objects only matching the targeted shoots is currently only done on the watch level, but the cache still contains all objects (as there is no straight-forward way to do this with the shared informer factories). We decided to not invest further into the issue as we want to refactor the controllers based on the controller-runtime eventually anyways, and we don't have such scalability concerns at the moment.

  • After this PR is merged I will open issues for follow-up work that needs to be done:

    • Gardenlet automated certificate rotation
    • Use NodeLease for heartbeats (see this comment)
    • Write seed authenticator that only allows the Gardenlet to handle resources related to its targeted seed(s)
    • Gardenlet should dynamically refresh secrets read from the Garden cluster (domain secrets, alerting secrets, etc.)
    • The GardenletReady condition should be picked up by the shoot care controller and added to the .status.conditions for Shoots that are used as seeds.

Release note:

With this PR we incorporate a major architectural change, namely, the introduction of a new Gardener component: the gardenlet.
* With previous versions of Gardener we were running the control loops for all shoot clusters and all seed clusters centrally in the garden cluster (`gardener-controller-manager`).
* Now, we have split the `gardener-controller-manager` and factored out the control loops that are involving communication with seed and shoot clusters into the new `gardenlet` component.
* The motivation was twofold, mainly to enable true scalability (beyond the capacity of a single and central controller-manager), but secondly also to allow running seed and shoot clusters in isolated networks.
* With the gardenlet, we distribute the shoot reconciliation (mainly, but also others) control loops into all seed clusters, effectively reducing the load and responsibility of a single gardenlet.
* Gardener's architecture is now even more comparable with the Kubernetes architecture: The Gardener control plane consists out of the `gardener-apiserver`, `gardener-controller-manager`, and `gardener-scheduler`, while the `gardenlet` is the primary agent running in every seed cluster. Take a look at this [comparison diagram](https://user-images.githubusercontent.com/19169361/68412287-38e80380-018c-11ea-9eb0-d8bdbd9493ba.png).
* Unlike the kubelet, the gardenlet allows to control more than one seed cluster (although, we don't recommend this setup for production). Basically, you can even run a single gardenlet in the garden cluster controlling all the seed clusters, resulting in the same Gardener v0 architecture. The landscape operator is responsible for designing its landscape, though, for the mentioned reasons we recommend running one gardenlet per seed.
* Please find a more detailed description [here](https://github.com/gardener/gardener/blob/master/docs/concepts/gardenlet.md).

* Migration from previous Gardener versions:
  * :warning: Be aware that the `gardener` Helm chart is now split into two separate Helm charts: `controlplane` and `gardenlet`. Also, some keys in the chart values have been moved around!
  * Please find the migration instructions [here](https://github.com/gardener/gardener/blob/master/docs/concepts/gardenlet.md#migrating-from-previous-gardener-versions).

* Removals and notable changes
  * The `SeedAvailable` condition does no longer exist and has been replaced by `Bootstrapped` and `GardenletReady`.
  * The `spec.secretRef` field in the `Seed` resource is now optional. It is only required in case the `Seed` is controlled by a Gardenlet that runs outside of the seed cluster itself.
  * The `Logging` and `HVPA` feature gates have been moved from the `gardener-controller-manager` to the `gardenlet`.
  * The `Seed` status does now contain a new `kubernetesVersion` field into which the gardenlet reports the Kubernetes version of the seed cluster.
  * The printer columns for `kubectl get seeds` have been reworked.
  * The `gardener-controller-manager` features two new controllers:
    * The seed lifecycle controller. Its main task is to set the `GardenletReady` condition to `Unknown` for `Seed` resources which don't receive heartbeats from the gardenlet anymore.
    * The CSR auto-approval controller watches `CertificateSigningRequest`s and auto-approves them in case they were filed by a gardenlet.
Developers need to run `make dev-setup` again, and `make start-gardenlet` in order to start the Gardenlet. Please find [here](https://github.com/gardener/gardener/blob/master/docs/development/local_setup.md) more instructions for how to setup the local development environment.
The base image version for all Gardener Docker image is now `alpine:3.10`.
All `garden.sapcloud.io:...` RBAC resources have been renamed to `gardener.cloud:...`.
@rfranzke rfranzke force-pushed the rfranzke:feature/gardenlet branch from adfbf59 to d10dd53 Dec 10, 2019
@rfranzke rfranzke merged commit 883fd82 into gardener:master Dec 10, 2019
4 checks passed
4 checks passed
concourse-ci/check Concourse CI build success
Details
concourse-ci/publish Concourse CI build success
Details
concourse-ci/test Concourse CI build success
Details
license/cla Contributor License Agreement is signed.
Details
@rfranzke rfranzke deleted the rfranzke:feature/gardenlet branch Dec 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.