cert-manager Done Right
If you are using Kubernetes, chances are you are also using
cert-manager to automate certificate creation and renewal, whether it is for securing intra-cluster communications or to get a certificate from a fully-fledged Certificate Authority (CA) for public-facing websites. While
cert-manager is very convenient, it needs a lot of credentials to do its magic, and in a shared cluster this can present a security risk. In this post, I will briefly review some of the risks that you may encounter and present a way to properly setup
cert-manager to minimize potential security issues.
How cert-manager works
First, let’s review what happens when you generate a certificate through
cert-manager. You start by creating an
ClusterIssuer that holds the required information to tell
cert-manager how to communicate with a CA supporting the ACME protocol, and in some cases some extra identifying information that
cert-manager can provide to the CA on your behalf to identify you. When using the ACME DNS01 mechanism to prove domain ownership, credentials to the DNS provider are also needed for
cert-manager to create the TXT record as part of the DNS challenge.
apiVersion: cert-manager.io/v1 kind: Issuer metadata: name: letsencrypt spec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: <email> privateKeySecretRef: name: letsencrypt solvers: - dns01: cloudflare: email: <email> apiTokenSecretRef: name: cloudflare-api-token-secret key: api-token
ClusterIssuer is installed, you can request certificates by placing a
Certificate resource referencing the issuer.
cert-manager will then
- create the Certificate Signing Request (CSR)
- send it to the CA using the information provided in the
- receive a DNS challenge from the CA
- use the DNS provider credentials to create the required TXT record
- periodically check the challenge status and retrieve the signed certificate from the CA
- delete the TXT record used for the DNS challenge
What could go wrong
There are a lot of moving parts and actors involved when using
cert-manager, and therefore multiple possible attack vectors. A malicious actor with access to the cluster can nicely ask
cert-manager to create arbitrary certificates if the issuer is a
cert-manager will be happy to comply.
cert-manager’s read/write credentials to the DNS provider somehow finds its way into the wrong hands, they can now freely modify DNS records. Depending on your DNS provider, the API token used for DNS updates can also be used for other API endpoints such as account management, resulting in an account takeover. In cases where the DNS provider also acts as a registrar, the attacker could now also transfer out the domain and modify the NS records to point to a DNS provider entirely in their control. Then they are now able to redirect your visitors to servers under their control, with completely valid TLS certificates created at their discretion.
Properly setting up cert-manager
Now that we’ve seen some of the possible risks involved with an unnecessarily permissive use of
cert-manager, we’ll explore some ways to protect ourselves. For our purposes, we’ll be using Amazon Route53 as our DNS provider as its IAM settings allow for more granular permissions and suits our purposes well, although any other DNS provider could do.
First, we’ll edit the
example.com to add some CAA records authorizing the CA we are using if this has not been done already. CAA records allow us to specify which CA is allowed to create certificates for the entire zone. This is not perfect as an attacker can simply use the same CA, and while it is a clear violation of CA requirements, an unauthorized CA could still issue certificates whether mistakenly or maliciously. It’s still better than nothing and adding a record is easy, so there’s no reason not to use CAA records.
Create a DNS zone for delegation purposes
Next, we’ll create a DNS zone for the sole purpose of solving ACME challenges. In this example, we’ll use
acme.example.com, a subdomain of
example.com, although any other domain would serve that purpose just fine. Ultimately,
cert-manager only needs to create a temporary TXT record, so giving it free write access to the entire DNS zone is overkill. How DNS01 works is that when you request a certificate for
some.example.com, you will be asked to insert a TXT record at
_acme-challenge.some.example.com with a randomized text value, the challenge. The CA will then make a DNS query and expect the challenge value given back.
There is actually no requirement for the TXT to be exactly at
_acme-challenge.some.example.com, as long a TXT query against
_acme-challenge.some.example.com ultimately leads to the challenge value. It is therefore completely valid to instead create a CNAME record at
_acme-challenge.some.example.com pointing to a TXT record at
_acme-challenge.some.other.example.com. This is known as the CNAME delegation of ACME challenge TXT records, and the cert-manager documentation has a modest mention of it.
Delegate control for the new hosted zone
After creating the
acme.example.com zone, we are given a set of NS records with instructions to add them at the registrar. As this is a subdomain of
example.com, we won’t be doing that however. Instead, we’ll now go back to our
example.com DNS provider, Cloudflare here for illustration purposes, and add those NS records under
acme.example.com. Our zones should now look somewhat like so:
# example.com example.com. IN NS ivan.ns.cloudflare.com. example.com. IN NS cheryl.ns.cloudflare.com. acme.example.com. IN NS ns-162.awsdns-10.com. acme.example.com. IN NS ns-2004.awsdns-12.co.uk. acme.example.com. IN NS ns-2233.awsdns-14.net. acme.example.com. IN NS ns-1111.awsdns-11.org. # acme.example.com acme.example.com. IN NS ns-162.awsdns-10.com. acme.example.com. IN NS ns-2004.awsdns-12.co.uk. acme.example.com. IN NS ns-2233.awsdns-14.net. acme.example.com. IN NS ns-1111.awsdns-11.org.
What we have just done here is that we have delegated control of the
acme.example.com subdomain of
example.com to another DNS provider, Amazon Route53 in this instance. Since
acme.example.com will only be used for the purposes of completing the ACME DNS01 challenge, in the event it gets taken over the damage will be lesser than a takeover of the root domain
example.com. When creating a service account to give to
cert-manager, we’re also able to limit its write scope to the
acme.example.com zone, which is convenient if you happen to manage a lot of zones in the same account. Fun fact, the
.com registrar is pretty much doing the exact same thing when you are leasing
example.com from them. Here we,
example.com, are pretty much “leasing out”
acme.example.com to another account, albeit also under our control, hosted at Amazon Route53 just like how registrars would do.
Setup CNAME delegation
Next, we need to indicate that
acme.example.com will take care of DNS01 challenges for
example.com. This is done via CNAME records. If for instance you plan to issue a certificate for
some.example.com, you’ll need to create the following CNAME record in the
_acme-challenge.some.example.com. CNAME _acme-challenge.some.acme.example.com.
You’ll actually need to create similar CNAME records for all domains for which you plan to request certificates for. This means that the fact only those with write access to the
example.com DNS zone can authorize certificate creations remains true, and write access to the
acme.example.com does not grant the ability to create certificates on arbitrary subdomains.
Lastly, we’ll need to modify our
Issuer and set
cnameStrategy: Follow on the DNS01 solver settings to indicate to
cert-manager that it should follow CNAMEs since it does not do so by default. Our earlier
Issuer definition will look like so:
apiVersion: cert-manager.io/v1 kind: Issuer metadata: name: letsencrypt spec: acme: ... solvers: - dns01: cnameStrategy: Follow route53: ...
Limit the DNS zones the Issuer can act upon
This is not a strict requirement, but it is good practice to tell our
Issuer what zones it can issue certificates for. This is so some actor inside our cluster who owns another domain from setting up CNAME delegation to
acme.example.com and trick
cert-manager into issuing certificates using our
Issuer, although this is not very meaningful. Where this would be useful is if you use a CA like ZeroSSL which comes with an External Account Binding to their dashboard account, or if your CA has rate limits like Let’s Encrypt does. Nevertheless, specifying which zones our
Issuer can issue certificates for is straightforawrd. You’ll only need to populate the
.spec.acme.solvers.selector field and specify some
apiVersion: cert-manager.io/v1 kind: Issuer metadata: name: letsencrypt spec: acme: ... solvers: - selector: dnsZones: - 'example.com' - dns01: cnameStrategy: Follow route53: ...
Limit who can request certificates inside the cluster
cert-manager offers two types of Issuer resources:
Issuer. The former is available cluster-wide whereas the latter is only available inside the namespace it is created in. We’ll be using the
Issuer resource as we do not want to give anyone in our cluster access to certificate issuances. Especially in a cluster shared by multiple teams, even within the same organization, it is best to reduce access to the
Issuer to prevent accidents. Otherwise another team could, without ill intent, request a wildcard certificate for
*.example.com and be off to the races, and this could later create headaches for the Platform team.
That being said, other teams (namespaces) may need their own certificates for their public-facing services such as
marketing.example.com. There are several approaches we can take to reconcile the need to provide certificates to other namespaces and security.
One approach we could take if you are using Contour as your ingress controller is to use the provided
TLSCertificateDelegation custom resource to delegate permission to Contour to read the
Secret containing certificate data from another namespace. This gives you fine-grained control into which certificate can be used by which namespace:
apiVersion: projectcontour.io/v1 kind: TLSCertificateDelegation metadata: name: sales-example-com-delegation namespace: cert-manager spec: delegations: - secretName: sales-example-com-tls targetNamespaces: - sales-team
In this example, we are delegating the certificate stored in the
sales-example-com-tls Secret to the
sales-team namespace. Tenants in the
sales-team namespace can then reference this certificate in and only in an
HTTPProxy resource managed by Contour, like so:
apiVersion: projectcontour.io/v1 kind: HTTPProxy metadata: name: website namespace: sales-team spec: virtualhost: tls: secretName: cert-manager/sales-example-com-tls ...
The nice thing about this approach is that only Contour has permission to read the certificate. If however you are using a different ingress controller, you can alternatively sync the secret to another namespace using Kubernetes Config Syncer as recommended by
cert-manager in their documentation. What this does is sync the
Secret resource containing the certificate to namespaces of your choosing. The downside is that the certificate gets synced into the target namespaces, making its contents entirely visible to anyone with read access to the namespace.
Lastly, depending on your ingress controller, there may be support for default certificates. This would involve issuing a wildcard certificate and setting the ingress controller to use it as a default. The downside of this approach is that now any namespace can create
Ingress resources and expose a public website with a valid certificate, so this is not much different to having a
In this post we’ve explored some of the attack surfaces that you may expose yourself to when using
cert-manager and went through some opinionated steps to use
cert-manager in a more secure fashion, namely using CNAME delegation and limiting access to certificate issuances and reference. Granted, in normal scenarios teams inside the same organization do not have ill intent. However in the same fashion as leaving passwords on a post-it note next to your computer is a bad idea, leaving important resources exposed inside a shared cluster environment should be avoided to minimize the attack surface in the event something unfortunate happens.