Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

NOTE: this document was a preliminary roadmap designed in the early days of the Tor / Tails merge, as part of a wider organizational feasibility study. It is kept for historical reference, the actual roadmap is now in TPA-RFC-73.

TPA/Tails sysadmins Overview

Deadlines:

  • May 15th: soft deadline.
  • May 30th: hard deadline, whatever is here will be merged on that day!

Minutes pad: https://pad.riseup.net/p/tortailsysadmin-3-T_hKBBTFwlnw6lieXO-keep

Executive Summary

The Tails sysadmins and Tor sysadmins (TPA) have been meeting weekly since April 9th to build a shared overview and establish a mutual working relationship. The weekly meeting has served as a knowledge sharing of each organization's resources, infrastructure, roadmaps, and policies. Once a baseline understanding of fundamentals was established, discussions oriented around building a timeline for how a convergence of resources and responsibilities could work, as well as assessments of associated risks.

A collaborative and living document was created to document these details and is being iteratively improved for greater clarity, cohesion and understanding between the two groups: https://pad.tails.net/n7fKF9JjRhq7HkgN1z4uEQ

Timeline

We plan on operating as a single sysadmin team for both projects, starting first in separate operations but progressively merging over the course of multiple years, here's a high-level view of the timeline:

  • July 2024 (first month): Tails integrates in TPI at the administrative level, no systems change, anarcat on holiday
  • August 2024 (second month): Tails sysadmins integrate in TPA meetings
  • September 2024 (third month): Tails and TPA cross-train, merge shifts and admin access
  • Q4 2024 (fourth to sixth month): start reversible merges and retirements, policy review and finalize roadmap
  • January 2025 (after 6 months): Tails' exit strategy point of no return, irreversible merges start
  • 2025 (first year): mixed operations, at the end of the year, everyone can handle both systems
  • 2025-2030 (5 years): rough guesstimate of the time required to complete mergers

Service merges

Services and infrastructure will be either merged or retired, each time picking the best solution for a specific problem. For example, TPA has been considering switching to Borg as a backup system, which Tails is already using, so a solution here would be for TPA to retire its aging Bacula backup system in favor of Borg. In reverse, Tails has a GitLab instance that could be usefully merged inside TPA's.

Costs

Tails currently has around $333.33 of monthly hardware expenses, $225.00/month of which are currently handled by TPI. Some of those costs could go down due to the merger.

TPA currently has around $2,250 of monthly hardware expenses, without amortization. Some of those costs could rise because of the merger.

Collaboration

Tails will adopt Tor's team lead structure, working inside TPA under anarcat's leadership.

Risks

TODO: just import the table here?

Resources and Infrastructure: Overview of resources, and an understanding of how resources will be handled

Tor

A bird eye view of everything can be seen in:

  • Tor Service list, which includes:
    • non-TPA services which are managed by other teams which we call "service admins" (but notice some of those are managed by TPA folks, e.g. GitLab)
  • Tor Machine list: ~90 machines, including about a dozen physical servers

The new-person guide has a good primer on services and infra as well (and, heck, much of the stuff here could be merged there).

History

Tor infrastructure was initially a copy of Debian's, build mostly by weasel (Peter Palfrader) who did that voluntarily from 2004 to about 2020. Paid staff started with hiro a little bit before that, with hiro doing part time work until she switched to metrics. Anarcat joined in March 2019, lavamind in 2021.

There's lots of legacy things lying: service not well documented, disconnected authentication, noisy or no monitoring.

But things also work: we push out ~2gbps steady on the mirrors, host hundreds (if not thousands) of accounts in GitLab, the Tor network is alive and relatively well, and regularly publish Tor Browser releases to multiple platforms.

Authentication

There's an LDAP server but its design is rather exotic. Not many things are plugged into it, right now basically it's shell accounts and email. Git used to be plugged in, but we're retiring Gitolite and the replacement (GitLab) isn't.

We use OpenPGP extensively, it's the root of trust for new LDAP accounts, which are the basis for shell and email access, so essential.

All TPA members are expected to use cryptographic tokens (e.g. Yubikeys) to store their secret keys.

DNS

Everything is under torproject.org except third-party stuff that's under torproject.net, itself in the public suffix list to avoid cross-domain attacks. DNS managed in a git repository, with reboot-detection to rotate hosts in DNS automatically. Managed DNSSEC, extensive TLSA and similar records.

IP addressing

No registered IP blocks, all delegated by upstreams (Hetzner, Quintex). Allocations managed in upstream control panels or DNS reverse zones when delegated.

RFC1918 space allocation is all within 172.30.0.0/16 with 172.30.131/24, 172.30.135/24, and 172.30.136/24 currently in use. Those used reserved for private storage networks (e.g. DRBD), management interfaces, and VPN endpoints.

Monitoring

We're using Icinga but are switching over to Prometheus/Grafana, which is already deployed.

https://grafana.torproject.org/ user: tor-guest, no password.

Points of presence

  • Hetzner: Ganeti cluster in rented hardware, virtual machines, Germany, Finland
  • Quintex: Ganeti cluster on owned hardware, 2 build machines for the apps team, Texas, USA
  • Netnod: DNS secondary
  • Safespring (ex sunet): virtual machines in an OpenStack cluster, Sweden

Both sysadmins currently operate from Montreal, Canada.

Hardware

TPA manages a heterogeneous set of machines that is essentially running on an untrusted and unmanaged network. We have two Ganeti clusters:

  • gnt-dal: Dallas, Texas, hosted at Quintex, 3 beefy AMD machines, 15TiB memory, 24TiB NVMe and SSD storage, 384 cores, 150$USD/month per node, 450$ + 300$ for two tor browser build machines, so 750$/mth
  • gnt-fsn: Falskenstein, Germany (Hetzner), 8 aging Intel machines, 512GiB memory, 48TiB NVMe and HDD storage, 96 cores, ~1500EUR/month

See also the Ganeti health Grafana dashboard.

There are also VMs hosted here and there and of course a relatively large fleet of virtual machines hosted in the above Ganeti clusters.

Total costs: about 2250$/month.

  • gnt-dal: 40k / 667$/mth
  • backup server: 5k / 100$/mth
  • apps build servers: 11k / 200$/mth
  • total: 1000$/mth amortization

Costs overview

  • quintex: 150$ /U with unlimited gbit included, 5 machines so roughly 750$USD/mth
  • hetzner: 1600EUR/mth+ should be double-checked
  • total about ~2-3k/mth, not including other services like tails, riseup, domain fronting, and so on managed by other teams
  • not including free services Fastly, significant donation in kind, used only for tor browser upgrades (which go over tor, of course)

Secrets

  • passwords: stored in a git repository on the Puppet server, managed by password-store / OpenPGP / GnuPG, see password manager
  • TLS: multiple CAs, mostly let's encrypt but also internal, see service/tls
  • SSH: keys managed in LDAP and Puppet

Tails

History

Tails was first released in 2009, and our first physical server (Lizard) exists since more than 10 years. For quite some time the infra was tightly integrated with servers self-hosted in the houses of some Tails folks, but we finally ditched those on 2022.

In 2019 we acquired a small and power-efficient backups server, in 2021 a dev server and two CI machines and, more recently another small and power-efficient server for redundancy of some servers.

In Tails, development and sysadmin are fairly integrated, there has been work to separate things, but more work needs to be done. For example, the Tails website lives in the main Tails repository, and the Weblate integration automatically feeds translations to the website via the main repository.

Authentication

  • shell access to our infra is granted solely through puppet-rbac
  • permissions on gitlab are role-based and managed solely through gitlabracadabra (we still need to sync these roles with the ones in puppet-rbac)
  • 2FA is mandatory for access to private gitlab projects

DNS

For several years Tails always used tails.boum.org and subdomains for applications and @boum.org for email, then bought tails.net on 2022. So far, only the website was moded there, and we have plans to start using it for email soon.

We have 2 PowerDNS servers, zones are managed manually via pdnsutil edit-zone ZONE in the primary server, and the database is repicated to the secondary server.

IP addressing

No registered IP blocks, all delegated by upstreams (SEACCP, Coloclue, Tachanka, PauLLA, Puscii). We have no control over allocation.

RFC1918 allocations are within 192.168.0.0/16, with the blocks 192.168.122.0/24, 192.168.126.0/24, 192.168.127.0/24, 192.168.132.0/24, 192.168.133.00/24, and 10.10.0.0/24 currently in use.

Monitoring

We use Icinga2 and email, but some of us would love to have nice Grafana dashboards and log centralization.

Points of presence

  • SEACCP: 3 main physical servers (general services and Jenkins CI), USA.
  • Coloclue: 2 small physical servers for backups and some redundancy, Netherlands.
  • PauLLA: dev server, France.
  • Puscii: VM for secondary DNS, Netherlands.
  • Tachanka!: VMs for monitoring and containerized services, USA, somewhere else.

Sysadmins currently operate from the Netherlands and Brazil.

Infrastructure map

Diagram of the Tails infrastructure showing 5 points of presence joined by a VPN over the Internetz, with 3 servers joined by a VLAN at SEACCP with lots of VMs, then the rest a collection of VMs and physical hosts

(Source file)

Hardware

At SEACCP (US):

  • lizard: Intel Xeon, 256 GiB memory, 6TiB disk, 48 cores
  • iguana: AMD Ryzen, 128 GiB memory, 1.8TiB disk, 16 cores
  • dragon: AMD Ryzen, 128 GiB memory, 1.8TiB disk, 24 cores

At Coloclue (Netherlands):

  • stone: AMD low power, 4GiB memory, 14.55TiB disk, 4 cores
  • chameleon: ?

Costs overview

Tails has a mix of physical machines, virtual machines, and services hosted by trusted third parties:

NameTypePurposeHosted byCost/yearPaid by
dragonphysicalJenkins executorSeaCCP$900Tor
iguanaphysicalJenkins executor and GitLab RunnerSeaCCP$900Tor
lizardphysicalmain serverSeaCCP$900Tor
ecoursvirtualmonitoringTachanka!180€Tails
geckovirtualrun containerized appsTachanka!180€Tails
skinkphysicaltest serverPauLLA0n/a
stonephysicalbackupsColoClue500€Tails
chameleonphysicalmail and fallback serverColoClue600€Tails
teelsvirtualsecondary DNSPUSCII180€Tails
Schleuderserviceencrypted mailing listsPUSCII60€Tails
GitLabservicecode hosting & project managementimmerda.ch300€Tails
Mailmanservicecleartext mailing listsAutistici0n/a
BitTorrentservicetrackertorrent.eu.org240€Tails

Total cost:

  • currently paid by Tor: $2,700
  • currently paid by Tails: 1,320 EUR

Amortization: 333.33$/mth, one server to replace already.

Secrets

Infra-related secrets are stored in either:

  • hiera-eyaml (public, PKCS7 encrypted)
  • password-store (private, OpenPGP encrypted)

TLS managed through a Puppet module and Let's Encrypt HTTP-01 authentication.

Main self-hosted services

Highly specific to Tails' needs:

  • Reprepro: APT repositories with:
    • snapshots of the Debian archive: release and reproducible builds
    • tails-specific packages
  • Weblate: translation of our website
  • Jenkins: automated builds and tests
  • Gitolite: Mostly CI-related repositories and some legacy stuff
  • Ikiwiki, NGINX: website
  • Whisperback: onion service running an MTA to receive tails whisperback reports

Mostly generic:

  • Bitcoind
  • Transmission: seeding image torrents
  • Icinga2: infrastructure monitoring
  • LimeSurvey: surveys
  • Schleuder: encrypted mailing lists
  • Mirrorbits: download redirector to mirrors
  • Hedgedoc
  • PowerDNS
  • XMPP bot

TPA / Tails service mapping

See roadmap below.

Policies

We have a data storage policy. We're in the process of doing a risk assessment to determine further policy needs.

Sysadmins are required to adhere to security policies Level A and Level B.

There are quite a few de facto policies that are not explicitly documented in one place, such as:

  • we try to adhere to the roles & profiles paradigm
  • all commits to our main Puppet repository are PGP signed

Roadmaps: Review of each team's open roadmaps, and outlook of the steps needed for the merger

TPA Roadmap

Big things this year:

  • mail services rebuild
  • nagios retirement
  • gitolite retirement (should be completed soon)
  • Debian bookworm upgrades
  • 2 new staff onboarding (sysadmin and web)
  • figure out how we organize web work
  • possible sponsor work for USAGM to get onion services deployed and monitored
  • might still be lacking capacity because of the latter and the merger

Tails Roadmap

Our roadmap is a bit fuzzy because of the potential merge, but this is some of the more important stuff:

  • the periodic upgrading of Jenkins and Puppet modules
  • secrets rotation
  • finalising risk assessment, establishing policies, emergency protocols, and working on mitigations
  • adding redundancy to critical services (website, APT repositories, DNS, Rsync, etc)
  • migrate e-mail and other web applications from tails.boum.org to tails.net
  • various improvements to dev experience in Jenkins and GitLab CI, including some automation of workflows and integration between both (a complete migration to GitLab CI has not yet been decided)
  • improve internal collaboration by increasing usage of "less techy" tools

Wishlist that could maybe benefit from merging infras:

  • Migrating backups to borg2 (once it's released)
  • Building and deploying the Tails website from GitLab CI (ongoing, taking into account Tor's setup)
  • Several improvements to monitoring, including nice grafana dashboards and log centralization
  • Building and storing container images

Merger roadmap

Tails services are split into three groups:

  • low complexity: those services are no-brainers. either we keep the Tails service as is (and even start using it inside TPA/Tor!) or it gets merged with a Tor service (or vice-versa)
  • medium complexity: those are trickier: either they require a lot more discussion and analysis to decide, or Tails has already decided, but it's more work than just flipping a switch
  • high complexity: those are core services that are already complex on one or both sides but that we still can't manage separately in the long term, so we need to make some hard choices and lots of work to merge

The timeline section details when each will happen as we get experience and onboard Tails services and staff. The further along we move in the roadmap, the more operations become merged.

The low/medium/high complexity pattern is from TPA's Debian major upgrade procedures and allows us to batch things together. The bulk of that work, of course, is "low" and "medium" work, so it's possible it doesn't map as well here, but hopefully we'll still have at least a couple of "low" complexity services we can quickly deal with.

It also matches the adjectives used in the Jacob Kaplan-Moss estimation techniques, and that is not a coincidence either.

The broad plan is to start by onboarding Tails inside TPI, then TPA, then getting access to each other's infrastructure, learning how things work, and slowly start merging and retiring services, over the course of multiple years. For the first month, nothing will change for Tails at the systems level, after that Tails sysadmins will onboard inside TPA and progressively start taking up TPA work (and vice versa). Tails will naturally start by prioritising Tails infra (and same for TPA), with the understanding that we will eventually merge those priorities. Until 6 months, only reversible changes will be made, but after that, more drastic changes will start.

Low complexity

  • bitcoind: retire (move to btcpayserver)
    • more a finance than a sysadmin issue
    • maybe empty Tails' wallet and then migrate the private key to whatever Tor uses
    • rationale: taking care of money won't be our job anymore
  • bittorrent: keep (Tails uses that for seeding images for the first time)
  • calendars: move from zimbra to nextcloud
    • tor: nextcloud
    • tails: zimbra
  • git-annex: migrate to GitLab LFS or keep?
    • FT needs to decide what to do here
    • rationale: gitlab doesn't support git-annex
    • careful here: LFS doesn't support partial checkouts!
  • Documentation: merge
    • tails:
      • single ikiwiki site?
      • public stuff is mostly up to date, some of it points to Puppet code
      • private stuff needs some love but should be quick to update
      • rewrite on the fly into tor's doc as we merge
    • tor:
      • multiple GitLab wikis spread around teams among different projects (also known as "the wiki problem")
      • multiple static site generators (lektor, hugo, mkdocs) in use for various sites
      • see also documentation on documentation
      • TPA wiki used to be a ikiwiki, but was dropped to reduce the number of tools in use, considering switching to mkdocs, hugo, or (now) ikiwiki as a replacement because GitLab wikis are too limited (not publicly writable, no search without GitLab Ultimate, etc)
  • hedgedoc: keep as is!
  • IP space: keep as is (there's no collision), depends on colo
  • meeting reminder: retire
    • rationale: all current reminders would either become obsolete (CoC, Reimbusements) or could be handled via calendar (FT meeting)
  • password management: merge into TPA's password-store
    • tor:
      • password store for TPA
      • vault warden in testing for the rest of the org
    • tails: password-store
  • schleuder: TPA merged into tails server (currently admined by non-TPA)
  • tor bridge: retire?
    • to discuss with FT (they may use it for testing)
    • issue is TPA/TPI can't run tor network infra like this, there are some rare exceptiosn (e.g. network team has relay-01.torproject.org, a middle relay research node)
  • whisperback: keep
    • it's fundamental for the Tails product and devs love it
  • xmpp bot: keep?
    • depends on discussion about IM below

Medium complexity

  • APT (public) repositories (reprepro): merge

    • tor
      • deb.torproject.org (hosts tor-little-t packages, maybe tor browser eventually)
    • tails
      • deb.tails.boum.org
    • Notes:
      • we're explicitly not including db.torproject.org in this proposal as it serves a different purpose then the above
      • there are details to discuss (as for example whether Tor is happy to include patched Ikiwiki in their repo
      • will need a separate component or separate domain for tails since many packages are patched versions specifically designed for tails (ikiwiki, cryptsetsup, network-manager)
  • backups: migrate to borg?

    • tor:
      • aging bacula infrastructure
      • puppetized
      • concerns about backup scalability, some servers have millions of files and hundreds of gigabytes of data
    • tails:
      • shiny new borg things
      • puppetized
    • first test borg for a subset of Tor server to see how it behaves, using tails' puppet code, particularly collector/onionoo servers
    • need a plan for compromised servers scenarios
  • colocation: merge, maybe retire some Tails points of presence if they become empty with retirements/merges

    • tor: hetzner, quintex, sunet
    • tails: seaccp, coloclue, tachanka, paulla, puscii
    • Notes:
      • tails not too happy about the idea of ditching solidatiry hosting (and thus funding comrades) in favor of commercial entities
      • it's pretty nice to have a physical machine for testing (the one at paulla)
      • TPA open to keeping more PoPs, the more the merrier, main concern is documentation, general challenge of onboarding new staff, and redundant services (e.g. we might want to retire the DNS server at puscii or the backup server at coloclue, keep in mind DNS servers sometimes get attacked with massive traffic, so puscii might want us out of there)
  • domain registration: merge (to njalla? to discuss)

    • tor: joker.com
    • tails: njalla
  • GitLab: merge into TPA, adopt gitlabracadabra for GitLab admins?

    • Tor:
      • self-hosted GitLab omnibus instance
      • discussions of switching to GitLab Ultimate
      • scalability challenges
      • storage being split up in object storage, multiple servers
      • multiple GitLab CI runners, also to be scaled up eventually
      • system installation managed through Puppet, projects, access control, etc manually managed
    • Tails:
      • hosted at immerda
      • no shell access
      • managed through gitlabracadabra
    • Notes:
      • tails has same reservations wrt. ditching solidarity collectives as with colocation
  • gitolite: retire

    • Tor:
      • retirement of public gitolite server completed
      • private repositories that could not be moved to GitLab (Nagios, DNS, Puppet remaining) were moved to isolated git repos on those servers, with local hooks, without gitolite
    • Tails
      • some private repo's that can easily be migrated
      • some repo's that use git-annex (see above)
      • some repo's that have git-hooks we have yet to replace with gitlab-ci stuff
  • instant messaging: merge into whatever new platform will come out of the lisbon session

    • tails: jabber
    • tor: IRC, some Matrix, session in Lisbon to discuss next steps
  • limesurvey: merge into Tails (or vice versa)?

    • tails uses it for mailing, but we would ditch that functionality in favor of Tor's CRM
  • mail: merge

    • tor:
      • MTA only (no mailboxes for now, but may change)
      • Mailman 2 (to upgrade!!)
      • Schleuder
      • monthly CiviCRM mass mailings (~200-300k recipients)
      • core mail server still running buster because of mailman
      • see TPA-RFC-44 for the last architecture plan, to be redone (TPA-RFC-45)
    • tails
      • boum.org mailrouting is a fucking mess, currently switching to tails.net
      • MTA only
      • schleuder at puscii
      • mailman at autistici
  • rsync: keep until mirror pools are merged, then retire

  • TLS: merge, see puppet

    • tor:
      • multiple CAs
      • mostly LE, through git
    • tails: LE, custom puppet module
  • virtualization: keep parts and/or slowly merge into ganeti?

    • tor:
      • ganeti clusters
      • was previously using libvirt, implemented some mass-migration script that could be reused to migrate away from libvirt again
    • tails:
      • libvirt with a custom deploy script
      • strict security requirements for several VMs (jenkins builders, www, rsync, weblate, ...):
        • no deployment of systems where contributors outside of core team can run code (eg. CI runners) for some VMs
        • no TCP forwarding over SSH (even though we want to revisit this decision)
        • only packages from Debian (main) and Tails repositories, with few exceptions
      • build machines that run jenkins agents are full and don't have spare resources
      • possibility: first move to GitLab CI, then wipe our 2 jenkins agents machines, then add them to Ganeti cluster (:+1:)
      • this will take long to happen (maybe high complexity?)
  • web servers: merge into TPA? to discuss

    • tor:
      • mix of apache and nginx
      • voxpupuli nginx puppet module + profiles
      • custom apache puppet module
    • tails:
      • mix of apache and nginx
      • voxpupuli nginx puppet module
      • complexity comes from Ikiwiki: ours is patched and causes a feedback loop back to tails.git

High complexity

  • APT (snapshot) repositories (reprepro): keep

    • tails
      • time-based.snapshots.deb.tails.boum.org
      • tagged.snapshots.deb.tails.boum.org
      • used for development
  • authentication: merge, needs a plan, blocker for puppetserver merge

    • tor: LDAP, mixed
    • tails: puppet-rbac, gitlabracadabra
  • DNS: migrate everything into a new simpler setup, blocker for puppetserver merge

    • tails: powerdns with lua scripts for downtime detection
    • tor: bind, git, auto-dns, convoluted design based on Debian, not well documented, see this section
    • migrate to either tor's configuration or, if impractical, use tails' powerdns as primary
  • firewalls: merge, migrate both codebases to puppetized nftables, blocker for puppetserver merge

    • tor: ferm, want to migrate to nftables
    • tails: iptables with puppet firewall module
  • icinga: retirement, migration to Prometheus, blocker for puppetserver merge

    • tails merges tor's puppet code
  • ikiwiki: keep? to discuss

    • tails:
      • automation of translation is heavily dependent on ikiwiki right now
      • templating would need to be migrated
      • we're unsure about what to replace it with and potential benefits.
      • splitting the website from tails.git seems more important as it would allow to give access to the website independently of the product
      • it'd be good to be able to grant people with untrusted machines access to post news items on the site and/or work on specific pages
  • jenkins: retire, move to GitLab CI, blocker for VPN retirement

    • tails
      • moving very slowly towards gitlab-ci, this is mostly an FT issue
      • probably a multi-year projuect
    • tor
  • mirror pool: merge? to discuss

    • tor: complex static mirror system
    • tails:
      • mirrorbits and volunteer-run mirrors
      • would like to move to mirrors under our own control because people often don't check signatures
      • groente is somewhat scared of tor's complex system
  • puppet: merge, high priority, needs a plan

    • tor:
      • complex puppet server deeply coupled with icinga, DNS, git
      • puppet 5.5 server, to be upgraded to 7 shortly
      • aging codebase
      • puppetfile, considering migrating to submodules
      • trocla
    • tails:
      • puppet 7 codebase
      • lots of third-party modules (good)
      • submodules
      • hiera-eyaml
      • signed commits
      • masterless backup server
    • how to merge the two puppet servers?! ideas
      • puppet in dry run against the new puppet server?
      • TPA needs to upgrade their puppet server and cleanup their code base first? including:
        • submodules
        • signed commits + verification?
      • depends tightly on decisions around authentication
      • step by step refactor both codebases to use the same modules, then merge codebases, then refactor to use the same base profiles
      • most tails stuff is already under the ::tails namespace, this makes it a bit easier to merge into 1 codebase
      • make a series of blockers (LDAP, backups, TLS, monitoring) to operate a codebase merge on first
      • roadmap is: merge code bases first, then start migrating servers over to a common, merged puppetserver (or tor's, likely the latter unless miracles happen in LDAP world)
  • Security policies: merge, high priority as guidelines are needed what can be merged/integrated and what not

    • tails:
      • currently doing risk-assessment on the entire infra, will influence current policies
      • groente to be added to security@tpo alias, interested in a security officer role
    • tor:
    • outcome
      • TPA and tails need to agree on a server access security policy
  • weblate: merge

    • Tails:
      • tails weblate has some pretty strict security requirements as it can push straight into tails.git!
      • weblate automatically feeds the website via integration scripts using weblate Python API...
      • ... which automatically feeds back weblate after Ikiwiki has done its things (updating .po files)
      • the setup currently depends on Weblate being self-hosted
    • tor: https://hosted.weblate.org/projects/tor/
      • sync'd with GitLab CI
      • needs a check-in with emmapeel but should be mergeable with tails?
  • VPN: retire tails' VPN, blocker for jenkins retirement

    • tor:
      • couple of ipsec tunnels
      • mostly migrated to SSH tunnels and IP-based limits
      • considering wireguard mesh
    • tails:
      • tinc mesh
      • used to improve authentication on Puppet, monitoring
      • critical for Jenkins
    • chicken and egg re. Puppet merge

Timeline: Identify timelines for adjusting to convergences of resources and responsibilities

  • Early April: TPA informed of Tails merge project
  • April 15: start of weekly TPA/Tails meetings, draft of this document begins, established:
    • designate lead contact point on each side (anarcat and sysadmins@tails.net)
    • make network map and inventory of both sides
    • establish decision-making process and organisational structure
    • review RFC1918 IP space
  • May 15: soft deadline for delivering a higher level document to the Tor Board
  • May: meeting in Lisbon
    • 19-24: zen-fu
    • 20-25: anarcat
    • 20-29: lavamind
    • 21-23: Tor meeting
    • 23: actual tails/tor meeting scheduled in lisbon, end of day?
  • May 30: hard deadline, whatever is here will be merged in the main document on that day!
  • July: tentative date for merger, Tails integrates in TPI
    • anarcat on holiday
    • integration in TPI, basic access grants (LDAP, Nextcloud, GitLab user accounts, etc), no systems integration yet
    • during this time, the Tails people operate as normal, but start integrating into TPI (timetracking, all hands meetings, payroll, holidays, reporting (to gaba while anarcat is away), etc, since anarcat is on holiday)
  • August (second month): onboarding, more access granted
    • lavamind on holiday
    • Begin 1:1s with Anarcat
    • 5-19 ("first two weeks"): soft integration, onboarding
    • GitLab access grants:
      • tails get maintainer access to TPA/Web GitLab repositories?
      • TPA gets access to Tails' GitLab server? (depends on when/if they get merged too)
  • September (end of first quarter): training, merging rotations and admin access
    • review security and privacy policies: merge tails security policies for TPA/servers (followup in tpo/tpa/team#41727)
      • review TPA root access list we are asking root users for compliance instead
    • access grants:
      • merge password managers
      • get admin access shared across both teams
    • ongoing tails training to TPA infra (and vice-versa)
    • tails start work on TPA infra, and vice versa
      • tails enters rotation of the "star of the week"
      • TPA includes tails services in "star of the week" rotation
    • make a plan for GitLab Tails merge, possibly migrate the projects tails/sysadmin and tails/sysadmin-private
  • Q4 2024: policy review, finalize roadmap, start work on some merges
    • review namespaces and identities (domain names in use, username patterns, user management, zone management)
    • review access control policies (VPN, account names, RBAC)
    • review secrets management (SSH keys, OpenPGP keys, TLS certs)
    • review process and change management
    • review firewall / VPN policies done in https://gitlab.torproject.org/tpo/tpa/team/-/issues/41721
    • by the end of the year (2024), adopt the final service (merge/retirement) roadmap and draft timeline
    • work on reversible merges can begin as segments of the roadmap are agreed upon
  • Q4 2024 - Q3 2025 (first year): mixed operations
    • tails and TPA progressively training each other on their infra, at the end of the year, everyone can handle both infras
  • January 2025 (6 months): exit strategy limit, irreversible merges can start
  • Q4 2025 - Q3 2030 (second to fifth year): merged operations
    • service merges and retirements completion, will take multiple years

Questions: Document open questions

  • exact merger roadmap and final state remains to be determined, specifically:
    • which services will be merged with TPA infrastructure?
    • will (TPA or Tails) services be retired? which?
    • there is a draft of those, but no timeline, this will be clarified after the merger is agreed upon
  • what is tails' exit strategy, specifically: how long do we hold off from merging critical stuff like Puppet before untangling becomes impossible? see the "two months mark" above (line 566)
    • 6 months (= job security period)
  • TODO: make an executive summary (on top)
  • layoff mitigation? (see risk section below)
  • how do we prioritize tails vs non-tails work? (wrote a blurb at line 298, at the end of the merger roadmap introduction)
  • OTF grants can restrict what tails folks can work on, must reframe timeline to take into account the grant timeline (ops or tails negotiators will take care of this)
  • TODO: any other open questions?

Collaboration: Build a picture of how collaboration would work

First, we want to recognize that we're all busy and that an eventual merge is an additional work load that might be difficult to accomplish in the current context. It will take years to complete and we do not want to pressure ourselves to unrealistic goals just for the sake of administrative cohesion.

We acknowledge that there is a different institutional cultures between the sysadmins at Tails and TPA. While the former has grown into an horizontal structure, without any explicit authority figure, the latter has a formal "authoritative" structure, with anarcat serving as the "team lead" and reporting to isabela, the TPI executive director.

Tails will comply with the "team lead" structure, with the understanding we're not building a purely "top down" team where incompetent leaders micromanage their workers. On the contrary, anarcat sees his role as an enabler, keeping things organized, diffusing conflicts before they happen, and generally helping team members getting work done. A leader, in this sense, is someone who helps the team and individual accomplish their goals. There is a part of the leader's work that is to transmit outside constraints to the team; this often translates into new projects being parachuted in the team, particularly sponsored projects, and there is little the team can do against this. The team lead sometimes has the uncomfortable role of imposing this on the rest of the team as well. Ultimately, the team lead also might make arbitrary calls to resolve conflicts or technical direction.

We want to keep things "fun" as much as possible. While there are a lot of "chores" in our work, we will try as best as we can to share those equally. Both Tails and TPA already have weekly rotation schedules for "interrupts": Tails calls those shifts and TPA "star of the week", a term Tails has expressed skepticism about. We could rename this role "mutual interrupt shield" or just "shield" to reuse Limoncelli's vocabulary.

We also acknowledge that we are engineers first, and this is particularly a challenge for the team lead, who has no formal training in management. This is a flaw anarcat is working on, through personal research and soon future ongoing training inside TPI. For now, his efforts center around "psychological safety" (see building compassionate software) which currently manifest as showing humility and recognizing his mistakes. A strong emphasis is made on valuing everyone's contributions, recognizing other people's ideas and letting go of decisions that are less important, and delegating as much as possible.

Ultimately, all of us were friends before (and through!) working together elsewhere, and we want to keep things that way.

Risks: Identify risks (and potential mitigations)

riskmitigation
institutional differences (tails more horizontal) may lead to friction and conflictsalary increases, see collaboration section
existing personal friendships could be eroded due to conflicts inside the new teamget training and work on conflict resolution, separate work and play
tails infra is closely entangled with the tails productwork in close coordination with the tails product team, patience, flexibility, disentangling
TPA doesn't comply with tails security and data policies and vice versadocument issues, isolate certain servers, work towards common security policies
different technical architectures could lead to frictionpick the best solution
overwork might make merging difficultspread timeline over multiple years, sufficient staff, timebox
Tails workers are used to more diversity than just sysadmin duties and may get boredkeep possibility of letting team members get involved in multiple teams
5-person sysadmin team might be too large, and TPI might want to layoff peopleget guarantees from operations that team size can be retained

Glossary

Tor

  • TPA: Tor Project sysAdmins, the sysadmin team
  • TPO: torproject.org
  • TPN: torproject.net, rarely used
  • TPI: Tor Project, Inc. the company employing Tor staff

Tails

  • FT: Foundations Team, Tails developers

A.10 Dealing with Mergers and Acquisitions

This is an excerpt from the Practice of System and Network Administration, a book about sysadmin things. I include it here because I think it's useful to our discussion and, in general my (anarcat's) go-to book when I'm in a situation like this where i have no idea what i'm doing.

  • If mergers and acquisitions will be frequent, make arrangements to get information as early as possible, even if this means that designated people will have information that prevents them from being able to trade stock for certain windows of time.

  • If the merger requires instant connectivity to the new business unit, set expectations that this will not be possible without some prior warning (see the previous item). If connection is forbidden while the papers are being signed, you have some breathing room—but act quickly!

  • If you are the chief executive officer (CEO), involve your chief information officer (CIO) before the merger is even announced.

  • If you are an SA, try to find out who at the other company has the authority to make the big decisions.

  • Establish clear, final decision processes.

  • Have one designated go-to lead per company.

  • Start a dialogue with the SAs at the other company. Understand their support structure, service levels, network architecture, security model, and policies. Determine what the new support model will be.

  • Have at least one initial face-to-face meeting with the SAs at the other company. It’s easier to get angry at someone you haven’t met.

  • Move on to technical details. Are there namespace conflicts? If so, determine how you will resolve them—Chapter 39.

  • Adopt the best processes of the two companies; don’t blindly select the processes of the bigger company.

  • Be sensitive to cultural differences between the two groups. Diverse opinions can be a good thing if people can learn to respect one another—Sections 52.8 and 53.5.

  • Make sure that both SA teams have a high-level overview diagram of both networks, as well as a detailed map of each site’s local area network (LAN)—Chapter 24.

  • Determine what the new network architecture should look like — Chapter 23. How will the two networks be connected? Are some remote offices likely to merge? What does the new security model or security perimeter look like?

  • Ask senior management about corporate-identity issues, such as account names, email address format, and domain name. Do the corporate identities need to merge or stay separate? Which implications does this have for the email infrastructure and Internet-facing services?

  • Learn whether any customers or business partners of either company will be sensitive to the merger and/or want their intellectual property protected from the other company.

  • Compare the security policies, looking, in particular, for differences in privacy policy, security policy, and means to interconnect with business partners.

  • Check the router tables of both companies, and verify that the Internet Protocol (IP) address space in use doesn’t overlap. (This is particularly a problem if you both use RFC 1918 address space.)

  • Consider putting a firewall between the two companies until both have compatible security policies.