Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Summary: merge Tails rotations with TPA's star of the week into a single role, merge Tails and TPA's support policies.

Background

The Tails and Tor merge process created a situation in which there are now two separate infrastructures as well as two separate support processes and policies. The full infrastructure merge is expected to take 5 years to complete, but we want to prioritize merging the teams into a single entity.

Proposal

As much as reasonably possible, every team member should be able to handle issues on both TPA and Tails infrastructure. Decreasing the level of specialization will allow for sharing support workload in a way that is more even and spaced out for all team members.

Goals

Must have

  • A list of tasks that should be handled during rotations that includes triage, routine tasks and interruption handling and comprises all expectations for both the TPA "star of the week" and the Tails "sysadmin on shift"
  • A process to make sure every TPA members is able to support both infrastructures
  • Guidelines for directing users to the correct place or process to get support

Non-Goals

Merging the following is not a goal of this policy:

  • Tools used by each team
  • Mailing lists
  • Technical workflows

The goal is really just to make everyone comfortable to work on both sides of the infra and to merge rotation shifts.

Support tasks

TPA-RFC-2: Support defines different support levels, but in the context of this proposal we use the tasks that are the responsibility of the "star of the week" as a basis for the merge of rotation shifts:

Tails processes are merged into each of the items above, even though with different timelines.

Triage of new issues

For triage of new issues, we abolish the previous processes used by Tails, and users of Tails services should now:

  • Stop creating new issues in the tpo/tpa/tails-sysadmin> project, and instead start using the tpo/tpa/team> project or dedicated projects when available (eg. tpo/tpa/puppet-weblate>).
  • Stop using the ~"To Do" label, and start using per-service labels, when available, or the generic ~"Tails" label when the relevant Tails service doesn't have a specific label.

Triage of Tails issues will follow the same triage process as other TPA issues and, apart from the changes listed above, the process should be the same for any user requesting support.

Routine tasks

The following routine tasks are expected from the Tails Sysadmin on shift:

  • update ACLs upon request (eg. Gitolite, GitLab, etc)
  • major upgrades of operating systems
  • manual upgrades (such as Jenkins, Weblate, etc)
  • reboot and restart systems for security issues or faults
  • interface with providers
  • update GitLab configuration (using gitlab-config)
  • process abuse reports in Tails' GitLab

Most of these were already described in TPA's "routine" tasks and the ones that were not are now also explicitly included there. Note that, until the infra merge is complete, these tasks will have to be operated in both infras.

The following processes were explicitly mentioned as expectations Tails Sysadmins (not necessarily on shift), and are either superseded by the current processes TPA has in place to organize its work or just made obsolete:

taskaction
avoid work duplicationsuperseded by TPA's triage process and check-ins
support the sysadmin on shiftsuperseded by TPA's triage process and check-ins
cover for the sysadmin on shift after 48h of MIAobsolete
self-evaluation of workobsolete
shift scheduleeventually replaced by TPA rotations ("star of the week")
Jenkins upgrade (including plugins)absorbed by TPA as a new task
LimeSurvey upgradeabsorbed by TPA with the LimeSurvey merge
Weblate upgradeabsorbed by TPA as a new task

Monitoring system

As per TPA-RFC-73, the plan is to ditch Tails' Icinga2 in favor of Tor's Prometheus, which is blocked by significant part of the Puppet merge.

Asking the TPA crew to get used to Tails Icinga2 in the meantime is not a good option because:

  • Tor has recently ditched Icinga, and asking them to adopt something like it once again would be demotivating
  • The system will eventually change anyway and using people's time to adopt it would not be a good investment of resources.

Because of the above, we choose to delay the merge of tasks that depend on the monitoring system until after Puppet is merged and the Tails infra has been been migrated to Prometheus. The estimate is we could start working on the migration of the monitoring system on November 2025, so we should probably not count on having that finished before the end of 2025.

This decision impacts some of the routine tasks (eg. examine disk usage, check for the need of server reboots) and "keeping an eye in the monitoring system" in general. In the meantime, we can merge triage, routine tasks that don't depend on the monitoring system and organization of incident response.

Incident response

Tails doesn't have a formal incident response process, so in this case the TPA process is just adopted as is.

Support merge process

The merge process is incremental:

  • Phase 0: Separate shifts (this is what happens now)
  • Phase 1: Triage and organization of incident response
  • Phase 2: Routine tasks
  • Phase 3: Merged support

Phase 0 - Separate shifts

This phase corresponds to what happens now: there are 2 different support teams essentially giving support for 2 different infras.

Phase 1 - Triage and organization of incident response

During this period, the TPA star of the week works in conjunction with the Tails Sysadmin on shifts in triage of new issues and organisation of incident response, when needed.

Each week there'll be two people looking at the relevant dashboards, and they should communicate to resolve questions that may arise about triage. Similarly, if there are incidents, they'll coordinate to handle together the organization of responses.

Phase 2 - Routine tasks

Once Tails monitoring has been migrated to Prometheus, the TPA star of the week and the Tails Sysadmin on shift can start collaborating on routine tasks and, when possible, start working on issues related to "each other's infra".

In this phase we still maintain 2 different support calendars, and Tails+Tor support pairs are changed every week according to these calendars.

Note that there are much more support requests on the TPA side, and much less sysadmin hours on the Tails side, so this should be done proportionately. The idea is to allow for smooth onboarding of both teams on both infras, so they should support each other to make sure any questions are answered and any blocks are removed.

Some routine tasks that are not related to monitoring may start earlier than the date we set for Phase 2 in the timeline below. Upgrades to Debian Trixie are one example of activity that will help both teams getting comfortable with each other's infra: "To help with merging rotations in the two teams, TPA staff will upgrade Tails machines, with Tails folks assistance, and vice-versa."

Phase 3 - Merged support

Every TPA member is now able to conduct all routine tasks and handle triage and interrupts in both infrastructures. We abolish the "Tails Sysadmin Shifts" calendar and incorporate all TPA members in the "Star of the week" rotation calendar.

Scope

Affected users

This policy mainly affects TPA members and any user of Tails services that needs to make a support request. Most impacted users are members of the Tails Team, as they are the main users of the Tails services, and, eventually, members of the Community and Fundraising teams, as they're probable users of some of Tails services such as the Tails website and Weblate.

Timeline

PhaseTimeline
Phase 0 - Separate shiftsnow - mid-April 2025
Phase 1 - Triage and organization of incident responsemid-April - December 2025
Phase 2 - Routine tasksJanuary 2026
Phase 3 - Merged supportApril 2026

References