Agenda
- Roll call: who's there and emergencies
- Roadmap review
- Triage rotation
- Holiday planning
- TPA survey review
- Other discussions
- Next meeting
- Metrics of the month
Roll call: who's there and emergencies
anarcat, hiro, gaba, no emergencies
The meeting took place on IRC because anarcat had too much noise.
Roadmap review
Did a lot of cleanup in the dashboard:
https://gitlab.torproject.org/tpo/tpa/team/-/boards
In general, the following items were priotirized:
- GitLab CI
- finish setting up the Cymru network, especially the VPN
- BTCpayserver
- tor browser build boxes
- small tickets like the git stuff and triage (see below)
The following items were punted to the future:
- SVN retirement (to January)
- password management (specs in January?)
- Puppet role account and verifications
We briefly discussed Grafana authentication, because of a request to create a new account on grafana2. anarcat said the current model of managing the htpasswd file in Puppet doesn't scale so well because we need to go through this process every time we need to grant access (or do a password reset) and identified 3 alternative authentication mechanisms:
- htpasswd managed in Puppet (status quo)
- Grafana users (disabling the htpasswd, basically)
- LDAP authentication
The current authentication model was picked because we wanted to automate user creation in Puppet, and because it's hard to create users in Grafana from Puppet. When a new Grafana server is setup, there's a small window during which an attacker could create an admin account, which we were trying to counter. But maybe those concerns are moot now.
We also discussed password management but that will be worked on in January. We'll try to set a roadmap for 2021 in January, after the results of the survey have come in.
Triage rotation
Hiro brought up the idea of rotating the triage work instead of having always the same person doing it. Right now, anarcat looks at the board at the beginning of every week and deals with tickets in the "Open" column. Often, he just takes the easy tickets, drops them in ~Next, and just does them, other times, they end up in ~Backlog or get closed or at least have some response of some sort.
We agreed to switch that responsibility every two weeks
Holiday planning
anarcat off from 14th to the 26th, hiro from 30th to jan 14th
TPA survey review
anarcat is working on a survey to get information from our users to plan the 2021 roadmap.
People like the survey in general, but the "services" questions were just too long. It was suggested to remove services TPA has nothing to do with (like websites or metrics stuff like check.tpo). But anarcat pointed out that we need to know which of those services are important: for example right now we "just know" that check.tpo is important, but it would be nice to have hard data that confirms it.
Anarcat agreed to separate the table into teams so that it doesn't look that long and will submit the survey back for review again by the end of the week.
Other discussions
New intern
MariaV just started as an Outreachy intern to work on Anonymous
Ticket System. She may be joining the #tpo-admin channel and may
join the gitlab/tooling meetings.
Welcome MariaV!
Next meeting
Quick check-in on December 29th, same time.
Metrics of the month
- hosts in Puppet: 79, LDAP: 82, Prometheus exporters: 133
- number of apache servers monitored: 28, hits per second: 205
- number of nginx servers: 2, hits per second: 3, hit ratio: 0.86
- number of self-hosted nameservers: 6, mail servers: 12
- pending upgrades: 1, reboots: 0
- average load: 0.34, memory available: 1.80 TiB/2.39 TiB, running processes: 481
- bytes sent: 245.34 MB/s, received: 139.99 MB/s
- GitLab tickets: 129 issues including...
- open: 0
- icebox: 92
- backlog: 20
- next: 9
- doing: 8
- (closed: 2130)
The upgrade prediction graph has been retired since it keeps predicting the upgrades will be finished in the past, which no one seems to have noticed from the last report (including me).
Metrics also available as the main Grafana dashboard. Head to https://grafana.torproject.org/, change the time period to 30 days, and wait a while for results to render.