TLS is the Transport Layer Security protocol, previously known as SSL and also known as HTTPS on the web. This page documents how TLS is used across the TPA infrastructure and specifically how we manage the related X.509 certificates that make this work.
- Tutorial
- How-to
- Reference
- Discussion
Tutorial
How to get an X.509 certificate for a domain with Let's Encrypt
-
If not already done, clone git repos
letsencrypt-domains:git clone letsencrypt@nevii.torproject.org:/srv/letsencrypt.torproject.org/repositories/letsencrypt-domains -
Add your domain name and optional alternative names (
SAN) to thedomainsfile:$EDITOR domains -
Push the updated domain list to the letsencrypt-domains repo
git diff domains git add domains git commit git push
The last command will produce output from the dehydrated command on
the DNS primary (currently nevii) to fetch new keys and update old
ones.
The new keys and certs are being copied to the LDAP host
(currently pauli) under
/srv/puppet.torproject.org/from-letsencrypt/. Then Puppet pick
those up in the ssl module. Use the ssl::service resource to
deploy them.
See the "Design" section below for more information on how that works.
See also service/static-component for an example of how to deploy an encrypted virtual host and onion service.
Renewing a certificate before its expiry date
If a certificate has been revoked, it should be renewed before its
expiry date. To do so, you can drop a special file in the
per-domain-config directory to change the expiry date range and run
the script by hand.
Create a file matching the primary domain name of the certificate on the DNS master:
cat <<EOF > /srv/letsencrypt.torproject.org/repositories/letsencrypt-domains/per-domain-config/example.torproject.org
RENEW_DAYS="85"
EOF
Here we tell the ACME client (dehydrated) to renew the cert if it is 85 days or older (instead of the 30 days period).
Then run the script by hand (or wait for cron to do its thing):
letsencrypt@nevii:~$ /srv/letsencrypt.torproject.org/bin/dehydrated-wrap --cron
[...]
Processing example.torproject.org with alternative names: example.torproject.org
+ Using certificate specific config file!
+ RENEW_DAYS = 85
+ Checking domain name(s) of existing cert... unchanged.
+ Checking expire date of existing cert...
+ Valid till May 18 20:40:45 2020 GMT Certificate will expire
(Less than 85 days). Renewing!
+ Signing domains...
[..]
Then remove the file.
Renewing a Harica certificate
15 days before the certificate expiry, Harica sends an email notification to
torproject-admin@torproject.org. The procedure to renew the certificate is
as follows:
- Login to https://harica.gr using TPA credentials
- Follow the renewal procedure in the certificate manager
- Download the new certificate
- On the Puppet server, locate the old certificates at
/srv/puppet.torproject.org/from-harica - Update the
.crt,.crt-chainand.crt-chainedfiles with the new cert - Launch a Puppet agent run on the static mirrors
- Use Tor Browser to verify the new certificate is being offered
Currently (10-2022), the intermediate certificate is signed by "HARICA TLS RSA
Root CA 2021", but this CA is not trusted by Tor Browser. Until it does become
trusted (planned for TB v12) it's necessary to add a cross-signed version of the
CA to the certificate chain (.crt-chained).
The cross-signed CA is available at https://repo.harica.gr but it may be simply copied from the previous certificate bundle.
Retiring a certificate
Let's Encrypt
If a certificate is not in use, it needs to be destroyed. Monitoring will warn about the certificate expiring if it's not in use.
To destroy this certificate, first remove it from the
letsencrypt-domains.git repository, in the domains file.
Then login to the name server (currently nevii) and destroy the
repositories:
rm -r \
/srv/letsencrypt.torproject.org/var/result/tpa-bootstrap.torproject.org* \
/srv/letsencrypt.torproject.org/var/certs/tpa-bootstrap.torproject.org
When you push the letsencrypt-domains.git repository, this will sync
over to the pauli server and silence the warning.
Harica
To remove a no-longer needed Harica certificate, eg. for an onion service:
- On the Puppet server, locate the certificate at
/srv/puppet.torproject.org/from-harica - Delete the
<onion>.*files
How-to
Certificate management via puppet
We can request (LE-signed) SSL certificates using
dehydrated::certificate. Certificates can also be requested by adding
them to the dehydrated::certificates hiera key. Adding more hosts to
the SAN set is also supported.
The certificate will be issued and installed after a few puppet runs
on the requesting host and the dehydrated_host (nevii); The
upstream puppet module has documented this reasonably well.
On nevii, puppet-dehydrated runs a cron job to regularly request and
update the certificates that puppet wants. See
/opt/dehydrated/requests.json for the requested certs, status.json
for issuance status and potential errors and issues.
The glue between puppet and our dns building setup is in the hook
script we deploy in profile::dehydrated_host (it's the same le-hook
our letsencrypt-domain.git stuff uses, with a slightly different config).
Our zones need to include /srv/dehydrated/var/hook/snippet so we
publish the responses to the LE verification challenge in DNS.
We copied the previous LE account, so our old CAA record is still
appropriate.
Wait to configure a service in puppet until it has a cert
In puppet code, you can check whether the certificate is already
available and make various puppet code conditional on that. We
can use the ready_for_merge fact, which tells puppet-dehydrated it can
built the fullchain_with_key concat because all the parts are in place.
$dn = $trusted['certname']
dehydrated::certificate { $dn: }
$ready_for_config = $facts.dig('dehydrated_domains', $dn, 'ready_for_merge')
Once $ready_for_config evaluates to true, the cert is available in
/etc/dehydrated at (among other places)
/etc/dehydrated/certs/${dn}_fullchain.pem with its key in
/etc/dehydrated/private/${dn}.key. There also is a
/etc/dehydrated/private/${title}_fullchain_with_key.pem file.
Reload services on cert updates
If you want to refresh a service when its certificate got updated, you can use something like this for instance:
dehydrated::certificate { $service_name: }
~> Class['nginx::service']
Copy the key/cert to a different place
To copy the key and maybe also the to a different place and user, this works for weasel's home assistant setup at home:
$key_dir = $facts['dehydrated_config']['key_dir']
$key_file = "${key_dir}/${domain}.key"
$crt_dir = $facts['dehydrated_config']['crt_dir']
$crt_full_chain = "${crt_dir}/${domain}_fullchain.pem"
file { '/srv/ha-share/ssl':
ensure => directory,
owner => 'root',
group => 'ha-backup',
mode => '0750',
}
Dehydrated_key[ $key_file ]
-> file { "/srv/ha-share/ssl/${domain}.key":
ensure => file,
owner => 'root',
group => 'ha-backup',
mode => '0440',
source => $key_file,
}
Concat[ $crt_full_chain ]
-> file { "/srv/ha-share/ssl/${domain}.crt":
ensure => file,
owner => 'root',
group => 'ha-backup',
mode => '0440',
source => $crt_full_chain,
}
If this becomes a common pattern, we should abstract this into its own defined type.
Pager playbook
Digicert validation emails
If you get email from DigiCert Validation, ask the Tor Browser team, they use it to sign code (see "Design" below for more information about which CAs are in use)
Waiting for master to update
If a push to the Let's encrypt repository loops on a warning like:
remote: Waiting for master to update torproject.net (for _acme-challenge.pages.torproject.net) from 2021012804. Currently at 2021012804..
It might be because the Let's Encrypt hook is not really changing the zonefile, and not incrementing the serial number (as hinted above). This can happen if you force-push an empty change to the repository and/or a previous hook failed to get a cert or was interrupted.
The trick then is to abort the above push, then manually edit (yes)
the zonefile in (for the torproject.net domain, in the above
example):
$EDITOR /srv/dns.torproject.org/var/generated/torproject.net
... and remove the _acme-challenge line. Then you should somehow
update the zone with another, unrelated change, to trigger a serial
number change. For example, you could add a random A record:
ynayMF5xckel8uGpo0GdVEQjM7X9 IN TXT "random record to trigger a zone rebuild, should be removed"
And push that change (in dns/domains.git). Then the serial number
will change, and the infrastructure will notice the _acme-challenge
record is gone. Then you can re-do the certification process and it
should go through.
Don't forget to remove the random TXT record created above once
everything is done.
Challenge is invalid!
If you get an email that looks like:
Subject: Cron <letsencrypt@nevii> sleep $(( RANDOM % 3600 )) && chronic dehydrated-wrap --cron
[...]
Waiting for master to update torproject.org (for _acme-challenge.dip.torproject.org) from 2021021304. Currently at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
SOA nevii.torproject.org. hostmaster.torproject.org. 2021021305 10800 3600 1814400 3601 from server 49.12.57.135 in 0 ms.
SOA nevii.torproject.org. hostmaster.torproject.org. 2021021304 10800 3600 1814400 3601 from server 194.58.198.32 in 11 ms.
SOA nevii.torproject.org. hostmaster.torproject.org. 2021021305 10800 3600 1814400 3601 from server 95.216.159.212 in 26 ms.
SOA nevii.torproject.org. hostmaster.torproject.org. 2021021305 10800 3600 1814400 3601 from server 89.45.235.22 in 29 ms.
SOA nevii.torproject.org. hostmaster.torproject.org. 2021021305 10800 3600 1814400 3601 from server 38.229.72.12 in 220 ms.
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for master to update torproject.org (for _acme-challenge.gitlab.torproject.org) from 2021021304. Currently at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
+ Responding to challenge for dip.torproject.org authorization...
+ Cleaning challenge tokens...
+ Challenge validation has failed :(
ERROR: Challenge is invalid! (returned: invalid) (result: ["type"] "dns-01"
["status"] "invalid"
["error","type"] "urn:ietf:params:acme:error:dns"
["error","detail"] "During secondary validation: DNS problem: query timed out looking up CAA for torproject.org"
["error","status"] 400
["error"] {"type":"urn:ietf:params:acme:error:dns","detail":"During secondary validation: DNS problem: query timed out looking up CAA for torproject.org","status":400}
It's because the DNS challenge took too long to deploy and it was refused. This is harmless: it will eventually succeed. Ignore the message, or, if you want to make sure, run the cron job by hand:
ssh -tt root@nevii.torproject.org sudo -u letsencrypt /srv/letsencrypt.torproject.org/bin/dehydrated-wrap --cron
db.torproject.org is WARNING: Certificate will expire
This message indicates the upcoming expiration of the OpenLDAP self-signed TLS certificate.
See service/ldap#server-certificate-renewal for instructions on how to renew it.
Disaster recovery
No disaster recovery plan yet (TODO).
Reference
Installation
There is no documentation on how to deploy this service from
scratch. To deploy a new cert, see the above section and the
ssl::service Puppet resource.
SLA
TLS is critical and should be highly available when relevant. It should fail closed, that is if it fails a security check, it should not allow a connection.
Design
TLS is one of two major transport security protocols used at TPA (the other being service/ipsec). It is used by web servers (Apache, HA Proxy, Nginx), backup servers (Bacula), mail servers (Postfix), and possibly more.
Certificate generation is done by git hooks for Let's Encrypt or by a
makefile and cron job for auto-ca, see below for details.
Certificate authorities in use at Tor
This documents mostly covers the Let's Encrypt certificates used by websites and other services managed by TPA.
But there are other certificate authorities in use inside TPA and, more broadly, at Tor. Here's the list of known CAs in operation at the time of writing (2020-04-15):
- Let's Encrypt: automatically issues certificates for most websites and domains, managed by TPA
- Globalsign: used by the Fastly CDN used to distribute
TBB updates (
cdn-fastly.torproject.org) - Digicert: used by other teams to sign software releases for Windows
- Harica: used for HTTPS on the donate.tpo onion service
- Puppet: our configuration management infrastructure has its own X.509 certificate authority which allows "Puppet agents" to authenticate and verify the "Puppet Master", see our documentation and upstream documentation for details
- LDAP: our OpenLDAP server uses a custom self-signed x.509 certificate authority that is distributed to clients via Puppet, see the documentation for instructions to renew this certificate manually
- internal "auto-ca": all nodes in Puppet get their own X.509 certificate signed by a standalone, self-signed X.509 certificate, documented below. it is used for backups (Bacula) and mail deliver (Postfix)
- Ganeti: each cluster has a set of self-signed TLS certificates in
/var/lib/ganeti/*.pem, used in the API and other. There is talk of having a cluster specific CA but it has so far not been implemented - contingency keys: three public/private RSA key pairs stored in the
TPA password manager (in
ssl-contingency-keys) that are part of the preloaded allow list shipped by Google Chrome (and therefore Firefox), see tpo/tpa/team#41154 for a full discussion on those
See also the alternative certificate authorities we could consider.
Certificate Authority Authorization (CAA)
torproject.org and torproject.net implement CAA records in DNS to restrict
which certificate authorities are allowed to issue certificates for these
domains and under what restrictions.
For Let's Encrypt domains, the CAA record also specifies which account is
allowed to request certificates. This is represented by an "account uri", and
is found among certbot and dehydrated configuration files. Typically, the
file is named account_id.json.
Internal auto-ca
The internal "auto-ca" is a standalone certificate authority running
on the Puppet master (currently pauli), in
/srv/puppet.torproject.org/auto-ca.
The CA runs based on a Makefile which takes care of creating,
revoking, and distributing certificates to all nodes. Certificates are
valid for a year (365 days, actually). If a certificate is going to
expire in less than 30 days, it gets revoked and removed.
The makefile then iterates over the known hosts (as per
/var/lib/misc/thishost/ssh_known_hosts, generated from service/ldap) to
create (two) certificates for each host. This makes sure certs get
renewed before their expiry. It will also remove certificates from
machines that are not known, which is the source of the revoked client emails TPA gets when a machine gets retired.
The Makefile then creates two certificates per host: a "clientcert"
(in clientcerts/) and a "server" (?) cert (in certs/). The former
is used by Bacula and Postfix clients to authenticate with the central
servers for backups and mail delivery, respectively. The latter is
used by those servers to authenticate to their clients but is also
used as default HTTPS certificates on new apache hosts.
Once all certs are created, revoked, and/or removed, they gets copied into Puppet's "$vardir", in the following locations:
/var/lib/puppetserver/auto-ca/certs/: server certs/var/lib/puppetserver/auto-ca/clientcerts/: client certs./var/lib/puppetserver/auto-ca/clientcerts/fingerprints: colon-separatedSHA256fingerprints of all "client certs", one per line/var/lib/puppetserver/auto-ca/certs/ca.crt: CA's certificate/var/lib/puppetserver/auto-ca/certs/ca.crl: certificate revocation list
In order for these paths to be available during catalog compilation, each
environment's modules/ssl/files is a symlink to
/var/lib/puppetserver/auto-ca.
This work gets run from the Puppet user's crontab, which calls make -s install every day.
Let's encrypt workflow
When you push to the git repository on the primary DNS server
(currently nevii.torproject.org:
-
the
post-receivehook runsdehydrated-wrap --cronwith a specialBASEvariable that points dehydrated at our configuration, in/srv/letsencrypt.torproject.org/etc/dehydrated-config -
Through that special configuration, the dehydrated command is configured to call a custom hook (
bin/le-hook) which implements logic around the DNS-01 authentication challenge, notably adding challenges, bumping serial numbers in the primary nameserver, and waiting for secondaries to sync. Note that there's a configuration file for that hook in/etc/dsa/le-hook.conf. -
The
le-hookalso pushes the changes around. The hook calls thebin/deployfile which installs the certificates files invar/result. -
CODE REMOVED: It also generates a Public Key Pin (PKP) hash with the
bin/get-pincommand and appends Diffie-Hellman paramets (dh-$size.pem) to the certificate chain. -
It finally calls the
bin/pushcommand which runsrsyncto the Puppet server, which in turns hardcodes the place where those files are dumped (inpauli:/srv/puppet.torproject.org/from-letsencrypt) through itsauthorized_keysfile. -
Finally, those certificates are collected by Puppet through the
sslmodule. Pay close attention to how thetor-puppet/modules/apache2/templates/ssl-key-pins.erbtemplate works: it will not deploy key pinning if the backup.pinfile is missing.
Note that by default, the dehydrated config includes
PRIVATE_KEY_RENEW="no" which means private keys are not regenerated
when a new cert is requested.
Issues
There is no issue tracker specifically for this project, File or search for issues in the team issue tracker with the ~TLS label.
Monitoring and testing
When a HTTPS certificate is configured on a host, it is automatically
monitored by default, through the ssl::service resource in Puppet.
Logs and metrics
Other documentation
TLS and X.509 is a vast application domain with lots of documentation.
TODO: identify key TLS docs that should be linked to here. RFCs? LE upstream docs?
The letsencrypt-domains.git repository is actually a fork of the
"upstream" project, from Debian System Administrators (DSA), see
the upstream git repository for more information.
Discussion
Overview
There are no plans to do major changes to the TLS configuration, although review of the cipher suites is in progress (as of April 2020). We should have mechanisms to do such audits on a more regular basis, and facilitate changes of those configurations over the entire infrastructure.
Goals
TODO: evaluate alternatives to the current letsencrypt deployment systems and see if we can reduce the number of CAs.
Must have
Nice to have
Non-Goals
Approvals required
Proposed Solution
Cost
Alternatives considered
Puppet for cert management
We could move more certificate management tasks to Puppet.
ACME issuance
For ACME-compatible certificate authorities (Let's Encrypt) really, we know about the following Puppet modules that could fit the bill:
-
bzed/dehydrated - from a Debian developer, uses dehydrated, weasel uses this for DNS-01 based issuance, creates CSR on client and cert on DNS server, converges over 4-6 runs
-
puppet/letsencrypt - from voxpupuli, certbot wrapper, issues certificates on clients
Worth noting is that currently, only certbot supports the onion-csr-01
challenge via the certbot-onion
plugin, although adding support for it to dehydrated is not expected to be
particularly difficult.
CA management
The auto-ca machinery could be replaced by Puppet code. Here are
modules that might be relevant:
-
mmack/cfssl: interfaces Cloudflare's cfssl "PKI/TLS swiss army knife", used at WMF
-
rehan/easyrsa: wrapper around easy-rsa, itself a wrapper around OpenSSL, not well documented
-
Aethylred/keymaster: handle X509 CAs, but also SSH host keys, which might be in conflict with our existing code
-
puppet/openssl: a bit bare-bones, no revocation support
Trocla also has support for x509 certs although it assumes there is already a CA present, and it does not support EC keys.
We could also leverage the ACME protocol designed by Let's Encrypt to run our own CA instead of just OpenSSL, although that might be overkill.
In general, it would be preferable to reuse an existing solution than maintain our own software in Make.
Other Certificate Authorities
There are actually a few other ACME-compatible certificate authorities which issue free certificates. The https.dev site lists a few alternatives which are, at the time of writing:
- Let's Encrypt - currently in use
- ZeroSSL - Sectigo reseller
- BuyPass - Norway CA
- Sectigo - formerly known as Comodo CA
- InCommon - also Sectigo?
HPKP
HPKP used to be used at Tor, but we expired it in March 2020 and completely stopped sending headers in October 2020. It is generally considered Deprecated, it has been disabled in Google Chrome in 2017 and should generally not be used anymore. See issue 33592 for details, and the history of this page for previous instructions.