Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

TLS is the Transport Layer Security protocol, previously known as SSL and also known as HTTPS on the web. This page documents how TLS is used across the TPA infrastructure and specifically how we manage the related X.509 certificates that make this work.

Tutorial

How to get an X.509 certificate for a domain with Let's Encrypt

  1. If not already done, clone git repos letsencrypt-domains:

    git clone letsencrypt@nevii.torproject.org:/srv/letsencrypt.torproject.org/repositories/letsencrypt-domains
    
  2. Add your domain name and optional alternative names (SAN) to the domains file:

    $EDITOR domains
    
  3. Push the updated domain list to the letsencrypt-domains repo

    git diff domains
    git add domains
    git commit
    git push
    

The last command will produce output from the dehydrated command on the DNS primary (currently nevii) to fetch new keys and update old ones.

The new keys and certs are being copied to the LDAP host (currently pauli) under /srv/puppet.torproject.org/from-letsencrypt/. Then Puppet pick those up in the ssl module. Use the ssl::service resource to deploy them.

See the "Design" section below for more information on how that works.

See also service/static-component for an example of how to deploy an encrypted virtual host and onion service.

Renewing a certificate before its expiry date

If a certificate has been revoked, it should be renewed before its expiry date. To do so, you can drop a special file in the per-domain-config directory to change the expiry date range and run the script by hand.

Create a file matching the primary domain name of the certificate on the DNS master:

cat <<EOF > /srv/letsencrypt.torproject.org/repositories/letsencrypt-domains/per-domain-config/example.torproject.org
RENEW_DAYS="85"
EOF

Here we tell the ACME client (dehydrated) to renew the cert if it is 85 days or older (instead of the 30 days period).

Then run the script by hand (or wait for cron to do its thing):

letsencrypt@nevii:~$ /srv/letsencrypt.torproject.org/bin/dehydrated-wrap --cron
[...]
Processing example.torproject.org with alternative names: example.torproject.org
 + Using certificate specific config file!
   + RENEW_DAYS = 85
 + Checking domain name(s) of existing cert... unchanged.
 + Checking expire date of existing cert...
 + Valid till May 18 20:40:45 2020 GMT Certificate will expire
(Less than 85 days). Renewing!
 + Signing domains...
[..]

Then remove the file.

Renewing a Harica certificate

15 days before the certificate expiry, Harica sends an email notification to torproject-admin@torproject.org. The procedure to renew the certificate is as follows:

  • Login to https://harica.gr using TPA credentials
  • Follow the renewal procedure in the certificate manager
  • Download the new certificate
  • On the Puppet server, locate the old certificates at /srv/puppet.torproject.org/from-harica
  • Update the .crt, .crt-chain and .crt-chained files with the new cert
  • Launch a Puppet agent run on the static mirrors
  • Use Tor Browser to verify the new certificate is being offered

Currently (10-2022), the intermediate certificate is signed by "HARICA TLS RSA Root CA 2021", but this CA is not trusted by Tor Browser. Until it does become trusted (planned for TB v12) it's necessary to add a cross-signed version of the CA to the certificate chain (.crt-chained).

The cross-signed CA is available at https://repo.harica.gr but it may be simply copied from the previous certificate bundle.

Retiring a certificate

Let's Encrypt

If a certificate is not in use, it needs to be destroyed. Monitoring will warn about the certificate expiring if it's not in use.

To destroy this certificate, first remove it from the letsencrypt-domains.git repository, in the domains file.

Then login to the name server (currently nevii) and destroy the repositories:

rm -r \
    /srv/letsencrypt.torproject.org/var/result/tpa-bootstrap.torproject.org* \
    /srv/letsencrypt.torproject.org/var/certs/tpa-bootstrap.torproject.org

When you push the letsencrypt-domains.git repository, this will sync over to the pauli server and silence the warning.

Harica

To remove a no-longer needed Harica certificate, eg. for an onion service:

  • On the Puppet server, locate the certificate at /srv/puppet.torproject.org/from-harica
  • Delete the <onion>.* files

How-to

Certificate management via puppet

We can request (LE-signed) SSL certificates using dehydrated::certificate. Certificates can also be requested by adding them to the dehydrated::certificates hiera key. Adding more hosts to the SAN set is also supported.

The certificate will be issued and installed after a few puppet runs on the requesting host and the dehydrated_host (nevii); The upstream puppet module has documented this reasonably well.

On nevii, puppet-dehydrated runs a cron job to regularly request and update the certificates that puppet wants. See /opt/dehydrated/requests.json for the requested certs, status.json for issuance status and potential errors and issues.

The glue between puppet and our dns building setup is in the hook script we deploy in profile::dehydrated_host (it's the same le-hook our letsencrypt-domain.git stuff uses, with a slightly different config).

Our zones need to include /srv/dehydrated/var/hook/snippet so we publish the responses to the LE verification challenge in DNS. We copied the previous LE account, so our old CAA record is still appropriate.

Wait to configure a service in puppet until it has a cert

In puppet code, you can check whether the certificate is already available and make various puppet code conditional on that. We can use the ready_for_merge fact, which tells puppet-dehydrated it can built the fullchain_with_key concat because all the parts are in place.

$dn = $trusted['certname']
dehydrated::certificate { $dn: }
$ready_for_config =  $facts.dig('dehydrated_domains', $dn, 'ready_for_merge')

Once $ready_for_config evaluates to true, the cert is available in /etc/dehydrated at (among other places) /etc/dehydrated/certs/${dn}_fullchain.pem with its key in /etc/dehydrated/private/${dn}.key. There also is a /etc/dehydrated/private/${title}_fullchain_with_key.pem file.

Reload services on cert updates

If you want to refresh a service when its certificate got updated, you can use something like this for instance:

dehydrated::certificate { $service_name: }
~> Class['nginx::service']

Copy the key/cert to a different place

To copy the key and maybe also the to a different place and user, this works for weasel's home assistant setup at home:

$key_dir = $facts['dehydrated_config']['key_dir']
$key_file = "${key_dir}/${domain}.key"

$crt_dir = $facts['dehydrated_config']['crt_dir']
$crt_full_chain = "${crt_dir}/${domain}_fullchain.pem"

file { '/srv/ha-share/ssl':
  ensure => directory,
  owner  => 'root',
  group  => 'ha-backup',
  mode   => '0750',
}

Dehydrated_key[ $key_file ]
-> file { "/srv/ha-share/ssl/${domain}.key":
  ensure => file,
  owner  => 'root',
  group  => 'ha-backup',
  mode   => '0440',
  source => $key_file,
}

Concat[ $crt_full_chain ]
-> file { "/srv/ha-share/ssl/${domain}.crt":
  ensure => file,
  owner  => 'root',
  group  => 'ha-backup',
  mode   => '0440',
  source => $crt_full_chain,
}

If this becomes a common pattern, we should abstract this into its own defined type.

Pager playbook

Digicert validation emails

If you get email from DigiCert Validation, ask the Tor Browser team, they use it to sign code (see "Design" below for more information about which CAs are in use)

Waiting for master to update

If a push to the Let's encrypt repository loops on a warning like:

remote: Waiting for master to update torproject.net (for _acme-challenge.pages.torproject.net) from 2021012804.  Currently at 2021012804..

It might be because the Let's Encrypt hook is not really changing the zonefile, and not incrementing the serial number (as hinted above). This can happen if you force-push an empty change to the repository and/or a previous hook failed to get a cert or was interrupted.

The trick then is to abort the above push, then manually edit (yes) the zonefile in (for the torproject.net domain, in the above example):

$EDITOR /srv/dns.torproject.org/var/generated/torproject.net

... and remove the _acme-challenge line. Then you should somehow update the zone with another, unrelated change, to trigger a serial number change. For example, you could add a random A record:

ynayMF5xckel8uGpo0GdVEQjM7X9    IN TXT "random record to trigger a zone rebuild, should be removed"

And push that change (in dns/domains.git). Then the serial number will change, and the infrastructure will notice the _acme-challenge record is gone. Then you can re-do the certification process and it should go through.

Don't forget to remove the random TXT record created above once everything is done.

Challenge is invalid!

If you get an email that looks like:

Subject: Cron <letsencrypt@nevii> sleep $(( RANDOM % 3600 )) && chronic dehydrated-wrap --cron

[...]

Waiting for master to update torproject.org (for _acme-challenge.dip.torproject.org) from 2021021304.  Currently at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
 SOA nevii.torproject.org. hostmaster.torproject.org. 2021021305 10800 3600 1814400 3601 from server 49.12.57.135 in 0 ms.
 SOA nevii.torproject.org. hostmaster.torproject.org. 2021021304 10800 3600 1814400 3601 from server 194.58.198.32 in 11 ms.
 SOA nevii.torproject.org. hostmaster.torproject.org. 2021021305 10800 3600 1814400 3601 from server 95.216.159.212 in 26 ms.
 SOA nevii.torproject.org. hostmaster.torproject.org. 2021021305 10800 3600 1814400 3601 from server 89.45.235.22 in 29 ms.
 SOA nevii.torproject.org. hostmaster.torproject.org. 2021021305 10800 3600 1814400 3601 from server 38.229.72.12 in 220 ms.
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
Waiting for master to update torproject.org (for _acme-challenge.gitlab.torproject.org) from 2021021304.  Currently at 2021021305..
Waiting for secondaries to update to match master at 2021021305..
 + Responding to challenge for dip.torproject.org authorization...
 + Cleaning challenge tokens...
 + Challenge validation has failed :(
ERROR: Challenge is invalid! (returned: invalid) (result: ["type"]	"dns-01"
["status"]	"invalid"
["error","type"]	"urn:ietf:params:acme:error:dns"
["error","detail"]	"During secondary validation: DNS problem: query timed out looking up CAA for torproject.org"
["error","status"]	400
["error"]	{"type":"urn:ietf:params:acme:error:dns","detail":"During secondary validation: DNS problem: query timed out looking up CAA for torproject.org","status":400}

It's because the DNS challenge took too long to deploy and it was refused. This is harmless: it will eventually succeed. Ignore the message, or, if you want to make sure, run the cron job by hand:

ssh -tt root@nevii.torproject.org sudo -u letsencrypt /srv/letsencrypt.torproject.org/bin/dehydrated-wrap --cron

db.torproject.org is WARNING: Certificate will expire

This message indicates the upcoming expiration of the OpenLDAP self-signed TLS certificate.

See service/ldap#server-certificate-renewal for instructions on how to renew it.

Disaster recovery

No disaster recovery plan yet (TODO).

Reference

Installation

There is no documentation on how to deploy this service from scratch. To deploy a new cert, see the above section and the ssl::service Puppet resource.

SLA

TLS is critical and should be highly available when relevant. It should fail closed, that is if it fails a security check, it should not allow a connection.

Design

TLS is one of two major transport security protocols used at TPA (the other being service/ipsec). It is used by web servers (Apache, HA Proxy, Nginx), backup servers (Bacula), mail servers (Postfix), and possibly more.

Certificate generation is done by git hooks for Let's Encrypt or by a makefile and cron job for auto-ca, see below for details.

Certificate authorities in use at Tor

This documents mostly covers the Let's Encrypt certificates used by websites and other services managed by TPA.

But there are other certificate authorities in use inside TPA and, more broadly, at Tor. Here's the list of known CAs in operation at the time of writing (2020-04-15):

  • Let's Encrypt: automatically issues certificates for most websites and domains, managed by TPA
  • Globalsign: used by the Fastly CDN used to distribute TBB updates (cdn-fastly.torproject.org)
  • Digicert: used by other teams to sign software releases for Windows
  • Harica: used for HTTPS on the donate.tpo onion service
  • Puppet: our configuration management infrastructure has its own X.509 certificate authority which allows "Puppet agents" to authenticate and verify the "Puppet Master", see our documentation and upstream documentation for details
  • LDAP: our OpenLDAP server uses a custom self-signed x.509 certificate authority that is distributed to clients via Puppet, see the documentation for instructions to renew this certificate manually
  • internal "auto-ca": all nodes in Puppet get their own X.509 certificate signed by a standalone, self-signed X.509 certificate, documented below. it is used for backups (Bacula) and mail deliver (Postfix)
  • Ganeti: each cluster has a set of self-signed TLS certificates in /var/lib/ganeti/*.pem, used in the API and other. There is talk of having a cluster specific CA but it has so far not been implemented
  • contingency keys: three public/private RSA key pairs stored in the TPA password manager (in ssl-contingency-keys) that are part of the preloaded allow list shipped by Google Chrome (and therefore Firefox), see tpo/tpa/team#41154 for a full discussion on those

See also the alternative certificate authorities we could consider.

Certificate Authority Authorization (CAA)

torproject.org and torproject.net implement CAA records in DNS to restrict which certificate authorities are allowed to issue certificates for these domains and under what restrictions.

For Let's Encrypt domains, the CAA record also specifies which account is allowed to request certificates. This is represented by an "account uri", and is found among certbot and dehydrated configuration files. Typically, the file is named account_id.json.

Internal auto-ca

The internal "auto-ca" is a standalone certificate authority running on the Puppet master (currently pauli), in /srv/puppet.torproject.org/auto-ca.

The CA runs based on a Makefile which takes care of creating, revoking, and distributing certificates to all nodes. Certificates are valid for a year (365 days, actually). If a certificate is going to expire in less than 30 days, it gets revoked and removed.

The makefile then iterates over the known hosts (as per /var/lib/misc/thishost/ssh_known_hosts, generated from service/ldap) to create (two) certificates for each host. This makes sure certs get renewed before their expiry. It will also remove certificates from machines that are not known, which is the source of the revoked client emails TPA gets when a machine gets retired.

The Makefile then creates two certificates per host: a "clientcert" (in clientcerts/) and a "server" (?) cert (in certs/). The former is used by Bacula and Postfix clients to authenticate with the central servers for backups and mail delivery, respectively. The latter is used by those servers to authenticate to their clients but is also used as default HTTPS certificates on new apache hosts.

Once all certs are created, revoked, and/or removed, they gets copied into Puppet's "$vardir", in the following locations:

  • /var/lib/puppetserver/auto-ca/certs/: server certs
  • /var/lib/puppetserver/auto-ca/clientcerts/: client certs.
  • /var/lib/puppetserver/auto-ca/clientcerts/fingerprints: colon-separated SHA256 fingerprints of all "client certs", one per line
  • /var/lib/puppetserver/auto-ca/certs/ca.crt: CA's certificate
  • /var/lib/puppetserver/auto-ca/certs/ca.crl: certificate revocation list

In order for these paths to be available during catalog compilation, each environment's modules/ssl/files is a symlink to /var/lib/puppetserver/auto-ca.

This work gets run from the Puppet user's crontab, which calls make -s install every day.

Let's encrypt workflow

When you push to the git repository on the primary DNS server (currently nevii.torproject.org:

  1. the post-receive hook runs dehydrated-wrap --cron with a special BASE variable that points dehydrated at our configuration, in /srv/letsencrypt.torproject.org/etc/dehydrated-config

  2. Through that special configuration, the dehydrated command is configured to call a custom hook (bin/le-hook) which implements logic around the DNS-01 authentication challenge, notably adding challenges, bumping serial numbers in the primary nameserver, and waiting for secondaries to sync. Note that there's a configuration file for that hook in /etc/dsa/le-hook.conf.

  3. The le-hook also pushes the changes around. The hook calls the bin/deploy file which installs the certificates files in var/result.

  4. CODE REMOVED: It also generates a Public Key Pin (PKP) hash with the bin/get-pin command and appends Diffie-Hellman paramets (dh-$size.pem) to the certificate chain.

  5. It finally calls the bin/push command which runs rsync to the Puppet server, which in turns hardcodes the place where those files are dumped (in pauli:/srv/puppet.torproject.org/from-letsencrypt) through its authorized_keys file.

  6. Finally, those certificates are collected by Puppet through the ssl module. Pay close attention to how the tor-puppet/modules/apache2/templates/ssl-key-pins.erb template works: it will not deploy key pinning if the backup .pin file is missing.

Note that by default, the dehydrated config includes PRIVATE_KEY_RENEW="no" which means private keys are not regenerated when a new cert is requested.

Issues

There is no issue tracker specifically for this project, File or search for issues in the team issue tracker with the ~TLS label.

Monitoring and testing

When a HTTPS certificate is configured on a host, it is automatically monitored by default, through the ssl::service resource in Puppet.

Logs and metrics

Other documentation

TLS and X.509 is a vast application domain with lots of documentation.

TODO: identify key TLS docs that should be linked to here. RFCs? LE upstream docs?

The letsencrypt-domains.git repository is actually a fork of the "upstream" project, from Debian System Administrators (DSA), see the upstream git repository for more information.

Discussion

Overview

There are no plans to do major changes to the TLS configuration, although review of the cipher suites is in progress (as of April 2020). We should have mechanisms to do such audits on a more regular basis, and facilitate changes of those configurations over the entire infrastructure.

Goals

TODO: evaluate alternatives to the current letsencrypt deployment systems and see if we can reduce the number of CAs.

Must have

Nice to have

Non-Goals

Approvals required

Proposed Solution

Cost

Alternatives considered

Puppet for cert management

We could move more certificate management tasks to Puppet.

ACME issuance

For ACME-compatible certificate authorities (Let's Encrypt) really, we know about the following Puppet modules that could fit the bill:

  • bzed/dehydrated - from a Debian developer, uses dehydrated, weasel uses this for DNS-01 based issuance, creates CSR on client and cert on DNS server, converges over 4-6 runs

  • puppet/letsencrypt - from voxpupuli, certbot wrapper, issues certificates on clients

Worth noting is that currently, only certbot supports the onion-csr-01 challenge via the certbot-onion plugin, although adding support for it to dehydrated is not expected to be particularly difficult.

CA management

The auto-ca machinery could be replaced by Puppet code. Here are modules that might be relevant:

Trocla also has support for x509 certs although it assumes there is already a CA present, and it does not support EC keys.

We could also leverage the ACME protocol designed by Let's Encrypt to run our own CA instead of just OpenSSL, although that might be overkill.

In general, it would be preferable to reuse an existing solution than maintain our own software in Make.

Other Certificate Authorities

There are actually a few other ACME-compatible certificate authorities which issue free certificates. The https.dev site lists a few alternatives which are, at the time of writing:

HPKP

HPKP used to be used at Tor, but we expired it in March 2020 and completely stopped sending headers in October 2020. It is generally considered Deprecated, it has been disabled in Google Chrome in 2017 and should generally not be used anymore. See issue 33592 for details, and the history of this page for previous instructions.