cert-manager Done Right
If you are using Kubernetes, chances are you are also using cert-manager
to automate certificate creation and renewal, whether it is for securing intra-cluster communications or to get a certificate from a fully-fledged Certificate Authority (CA) for public-facing websites. While cert-manager
is very convenient, it needs a lot of credentials to do its magic, and in a shared cluster this can present a security risk. In this post, I will briefly review some of the risks that you may encounter and present a way to properly setup cert-manager
to minimize potential security issues.
How cert-manager works
First, let’s review what happens when you generate a certificate through cert-manager
. You start by creating an Issuer
or ClusterIssuer
that holds the required information to tell cert-manager
how to communicate with a CA supporting the ACME protocol, and in some cases some extra identifying information that cert-manager
can provide to the CA on your behalf to identify you. When using the ACME DNS01 mechanism to prove domain ownership, credentials to the DNS provider are also needed for cert-manager
to create the TXT record as part of the DNS challenge.
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: <email>
privateKeySecretRef:
name: letsencrypt
solvers:
- dns01:
cloudflare:
email: <email>
apiTokenSecretRef:
name: cloudflare-api-token-secret
key: api-token
Once an Issuer
or ClusterIssuer
is installed, you can request certificates by placing a Certificate
resource referencing the issuer. cert-manager
will then
- create the Certificate Signing Request (CSR)
- send it to the CA using the information provided in the
Issuer
resource - receive a DNS challenge from the CA
- use the DNS provider credentials to create the required TXT record
- periodically check the challenge status and retrieve the signed certificate from the CA
- delete the TXT record used for the DNS challenge
What could go wrong
There are a lot of moving parts and actors involved when using cert-manager
, and therefore multiple possible attack vectors. A malicious actor with access to the cluster can nicely ask cert-manager
to create arbitrary certificates if the issuer is a ClusterIssuer
, and cert-manager
will be happy to comply.
Worse, if cert-manager
’s read/write credentials to the DNS provider somehow finds its way into the wrong hands, they can now freely modify DNS records. Depending on your DNS provider, the API token used for DNS updates can also be used for other API endpoints such as account management, resulting in an account takeover. In cases where the DNS provider also acts as a registrar, the attacker could now also transfer out the domain and modify the NS records to point to a DNS provider entirely in their control. Then they are now able to redirect your visitors to servers under their control, with completely valid TLS certificates created at their discretion.
Properly setting up cert-manager
Now that we’ve seen some of the possible risks involved with an unnecessarily permissive use of cert-manager
, we’ll explore some ways to protect ourselves. For our purposes, we’ll be using Amazon Route53 as our DNS provider as its IAM settings allow for more granular permissions and suits our purposes well, although any other DNS provider could do.
CAA records
First, we’ll edit the example.com
to add some CAA records authorizing the CA we are using if this has not been done already. CAA records allow us to specify which CA is allowed to create certificates for the entire zone. This is not perfect as an attacker can simply use the same CA, and while it is a clear violation of CA requirements, an unauthorized CA could still issue certificates whether mistakenly or maliciously. It’s still better than nothing and adding a record is easy, so there’s no reason not to use CAA records.
Create a DNS zone for delegation purposes
Next, we’ll create a DNS zone for the sole purpose of solving ACME challenges. In this example, we’ll use acme.example.com
, a subdomain of example.com
, although any other domain would serve that purpose just fine. Ultimately, cert-manager
only needs to create a temporary TXT record, so giving it free write access to the entire DNS zone is overkill. How DNS01 works is that when you request a certificate for some.example.com
, you will be asked to insert a TXT record at _acme-challenge.some.example.com
with a randomized text value, the challenge. The CA will then make a DNS query and expect the challenge value given back.
There is actually no requirement for the TXT to be exactly at _acme-challenge.some.example.com
, as long a TXT query against _acme-challenge.some.example.com
ultimately leads to the challenge value. It is therefore completely valid to instead create a CNAME record at _acme-challenge.some.example.com
pointing to a TXT record at _acme-challenge.some.other.example.com
. This is known as the CNAME delegation of ACME challenge TXT records, and the cert-manager documentation has a modest mention of it.
Delegate control for the new hosted zone
After creating the acme.example.com
zone, we are given a set of NS records with instructions to add them at the registrar. As this is a subdomain of example.com
, we won’t be doing that however. Instead, we’ll now go back to our example.com
DNS provider, Cloudflare here for illustration purposes, and add those NS records under acme.example.com
. Our zones should now look somewhat like so:
# example.com
example.com. IN NS ivan.ns.cloudflare.com.
example.com. IN NS cheryl.ns.cloudflare.com.
acme.example.com. IN NS ns-162.awsdns-10.com.
acme.example.com. IN NS ns-2004.awsdns-12.co.uk.
acme.example.com. IN NS ns-2233.awsdns-14.net.
acme.example.com. IN NS ns-1111.awsdns-11.org.
# acme.example.com
acme.example.com. IN NS ns-162.awsdns-10.com.
acme.example.com. IN NS ns-2004.awsdns-12.co.uk.
acme.example.com. IN NS ns-2233.awsdns-14.net.
acme.example.com. IN NS ns-1111.awsdns-11.org.
What we have just done here is that we have delegated control of the acme.example.com
subdomain of example.com
to another DNS provider, Amazon Route53 in this instance. Since acme.example.com
will only be used for the purposes of completing the ACME DNS01 challenge, in the event it gets taken over the damage will be lesser than a takeover of the root domain example.com
. When creating a service account to give to cert-manager
, we’re also able to limit its write scope to the acme.example.com
zone, which is convenient if you happen to manage a lot of zones in the same account. Fun fact, the .com
registrar is pretty much doing the exact same thing when you are leasing example.com
from them. Here we, example.com
, are pretty much “leasing out” acme.example.com
to another account, albeit also under our control, hosted at Amazon Route53 just like how registrars would do.
Setup CNAME delegation
Next, we need to indicate that acme.example.com
will take care of DNS01 challenges for example.com
. This is done via CNAME records. If for instance you plan to issue a certificate for some.example.com
, you’ll need to create the following CNAME record in the example.com
zone:
_acme-challenge.some.example.com. CNAME _acme-challenge.some.acme.example.com.
You’ll actually need to create similar CNAME records for all domains for which you plan to request certificates for. This means that the fact only those with write access to the example.com
DNS zone can authorize certificate creations remains true, and write access to the acme.example.com
does not grant the ability to create certificates on arbitrary subdomains.
Lastly, we’ll need to modify our Issuer
and set cnameStrategy: Follow
on the DNS01 solver settings to indicate to cert-manager
that it should follow CNAMEs since it does not do so by default. Our earlier Issuer
definition will look like so:
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt
spec:
acme:
...
solvers:
- dns01:
cnameStrategy: Follow
route53:
...
Limit the DNS zones the Issuer can act upon
This is not a strict requirement, but it is good practice to tell our Issuer
what zones it can issue certificates for. This is so some actor inside our cluster who owns another domain from setting up CNAME delegation to acme.example.com
and trick cert-manager
into issuing certificates using our Issuer
, although this is not very meaningful. Where this would be useful is if you use a CA like ZeroSSL which comes with an External Account Binding to their dashboard account, or if your CA has rate limits like Let’s Encrypt does. Nevertheless, specifying which zones our Issuer
can issue certificates for is straightforawrd. You’ll only need to populate the .spec.acme.solvers.selector
field and specify some dnsZones
.
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt
spec:
acme:
...
solvers:
- selector:
dnsZones:
- 'example.com'
- dns01:
cnameStrategy: Follow
route53:
...
Limit who can request certificates inside the cluster
cert-manager
offers two types of Issuer resources: ClusterIssuer
and Issuer
. The former is available cluster-wide whereas the latter is only available inside the namespace it is created in. We’ll be using the Issuer
resource as we do not want to give anyone in our cluster access to certificate issuances. Especially in a cluster shared by multiple teams, even within the same organization, it is best to reduce access to the Issuer
to prevent accidents. Otherwise another team could, without ill intent, request a wildcard certificate for *.example.com
and be off to the races, and this could later create headaches for the Platform team.
That being said, other teams (namespaces) may need their own certificates for their public-facing services such as sales.example.com
or marketing.example.com
. There are several approaches we can take to reconcile the need to provide certificates to other namespaces and security.
One approach we could take if you are using Contour as your ingress controller is to use the provided TLSCertificateDelegation
custom resource to delegate permission to Contour to read the Secret
containing certificate data from another namespace. This gives you fine-grained control into which certificate can be used by which namespace:
apiVersion: projectcontour.io/v1
kind: TLSCertificateDelegation
metadata:
name: sales-example-com-delegation
namespace: cert-manager
spec:
delegations:
- secretName: sales-example-com-tls
targetNamespaces:
- sales-team
In this example, we are delegating the certificate stored in the sales-example-com-tls
Secret to the sales-team
namespace. Tenants in the sales-team
namespace can then reference this certificate in and only in an HTTPProxy
resource managed by Contour, like so:
apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
name: website
namespace: sales-team
spec:
virtualhost:
tls:
secretName: cert-manager/sales-example-com-tls
...
The nice thing about this approach is that only Contour has permission to read the certificate. If however you are using a different ingress controller, you can alternatively sync the secret to another namespace using Kubernetes Config Syncer as recommended by cert-manager
in their documentation. What this does is sync the Secret
resource containing the certificate to namespaces of your choosing. The downside is that the certificate gets synced into the target namespaces, making its contents entirely visible to anyone with read access to the namespace.
Lastly, depending on your ingress controller, there may be support for default certificates. This would involve issuing a wildcard certificate and setting the ingress controller to use it as a default. The downside of this approach is that now any namespace can create Ingress
resources and expose a public website with a valid certificate, so this is not much different to having a ClusterIssuer
.
Conclusion
In this post we’ve explored some of the attack surfaces that you may expose yourself to when using cert-manager
and went through some opinionated steps to use cert-manager
in a more secure fashion, namely using CNAME delegation and limiting access to certificate issuances and reference. Granted, in normal scenarios teams inside the same organization do not have ill intent. However in the same fashion as leaving passwords on a post-it note next to your computer is a bad idea, leaving important resources exposed inside a shared cluster environment should be avoided to minimize the attack surface in the event something unfortunate happens.