Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

A "status" dashboard is a simple website that allows service admins to clearly and simply announce down times and recovery.

Note that this be considered part of the documentation system, but is documented separately.

The site is at https://status.torproject.org/ and the source at https://gitlab.torproject.org/tpo/tpa/status-site/.

Tutorial

Local development environment

To install the development environment for the status site, you should have a copy of the Hugo static site generator and the git repository:

sudo apt install hugo
git clone --recursive -b main https://gitlab.torproject.org/tpo/tpa/status-site.git
cd status-site

WARNING: the URL of the Git repository changed! It used to be hosted at GitLab, but is now hosted at Gitolite. The repository is mirrored to GitLab, but pushing there will not trigger build jobs.

Then you can start a local development server to preview the site with:

hugo serve --baseURL=http://localhost/
firefox http://localhost:1313/

The content can also be built in the public/ directory with, simply:

hugo

Creating new issues

Issues are stored in content/issues/. You can create a new issue with hugo new, for example:

hugo new issues/2021-02-03-testing-cstate-again.md

This create the file from a pre-filled template (called an archetype in Hugo) and put it in content/issues/2021-02-03-testing-cstate-again.md.

If you do not have hugo installed locally, you can also copy the template directly (from themes/cstate/archetypes/default.md), or copy an existing issue and use it as a template.

Otherwise the upstream guide on how to create issues is fairly thorough and should be followed.

In general, keep in mind that the date field is when the issue started, not when you posted the issue, see this feature request asking for an explicit "update" field.

Also note that you can add draft: true to the front-matter (the block on top) to keep the post from being published on the front page before it is ready.

Uploading site to the static mirror system

Uploading the site is automated by continuous integration. So you simply need to commit and push:

git commit -a -myolo
git push

Note that only the TPA group has access to the repository for now, but other users can request access as needed.

You can see the progress of build jobs in the GitLab CI pipelines. If all goes well, successful webhook deliveries should show up in this control panel as well.

If all goes well, the changes should propagate to the mirrors within a few seconds to a minute.

See also the disaster recovery options below.

Keep in mind that this is a public website. You might want to talk with the comms@ people before publishing big or sensitive announcements.

How-to

Changing categories

cState relies on "systems" which live inside a "category" For example, the "v3 onion services" are in the "Tor network" category. Those are defined in the config.yml file, and each issue (in content/issues) refers to one or more "system" that is affected by it.

Theming

The logo lives in static/logo.png. Some colors are defined in config.yml, search for Colors throughout cState.

Pager playbook

No monitoring specific to this service exists.

Disaster recovery

It should be possible to deploy the static website anywhere that supports plain HTML, assuming you have a copy of the git repository.

The instructions in all of the subsections below assume you have a copy of the git repository.

Important: make sure you follow the installation instructions to also clone the submodules!

If the git repository is not available, you could start from scratch using the example repository as well.

From here on, it is assumed you have a copy of the git repository (or the example one).

Those procedures were not tested.

Manual deployment to the static mirror system

If GitLab is down, you can upload the public/ folder content under /srv/static-gitlab-shim/status.torproject.org/.

The canonical source for the static websites rotation is defined in Puppet (in modules/staticsync/data/common.yaml) and is currently set to static-gitlab-shim.torproject.org. This rsync command should be enough:

rsync -rtP public/ static-gitlab-shim@static-gitlab-shim.torproject.org:/srv/static-gitlab-shim/status.torproject.org/public/

This might require adding your key to /etc/ssh/userkeys/static-gitlab-shim.more.

Then the new source material needs to be synchronized to the mirrors, with:

sudo -u mirroradm static-update-component status.torproject.org

This requires access to the mirroradm group, although typically the machine is only accessible to TPA anyways.

Don't forget to push the changes to the git repository, once that is available. It's important so that the next people can start from your changes:

git commit -a -myolo
git push

Netlify deployment

Upstream has instructions to deploy to Netlify, which, in our case, might be as simple as following this link and filling in those settings:

  • Build command: hugo
  • Publish directory: public
  • Add one build environment variable
    • Key: HUGO_VERSION
    • Value: 0.48 (or later)

Then, of course, DNS needs to be updated to point there.

GitLab pages deployment

A site could also be deployed on another GitLab server with "GitLab pages" enabled. For example, if the repository is pushed to https://gitlab.com/, the GitLab CI/CD system there will automatically pick up the configuration and run it.

Unfortunately, due to the heavy customization we used to deploy the site to the static mirror system, the stock .gitlab-ci.yml file will likely not work on another system. An alternate .gitlab-ci-pages.yml file should be available in the Git repository and can be activated in the GitLab project in Settings -> CI/CD -> CI/CD configuration file.

That should give you a "test" GitLab pages site with a URL like:

https://user.gitlab.io/tpa-status/

To transfer the real site there, you need to go into the project's Settings -> Pages section and hit New Domain.

Enter status.torproject.org there, which will ask you to add an TXT record in the torproject.org zone.

Add the TXT record to domains.git/torproject.org, commit and push, then hit the "Retry verification" button in the GitLab interface.

Once the domain is verified, point the status.torproject.org domain to the new backend:

status CNAME user.gitlab.io

For example, in my case, it was:

status CNAME anarcat.gitlab.io

See also the upstream documentation for details.

Those are the currently known mirrors of the status site:

Reference

Installation

See the instructions on how to setup a local development environment and the design section for more information on how this is setup.

Upgrades

Upgrades to the software are performed by updating the cstate submodule.

Since November, the renovate-cron bot will pass through the project to make sure that submodule is up to date.

Hugo itself is managed through the Debian packages provided as part of the bookworm container, and therefore benefit from the normal Debian support policies. Major Debian upgrades need to be manually performed in the .gitlab-ci.yml file and are not checked by renovate.

SLA

This service should be highly available. It should support failure from one or all point of presence: if all fail, it should be easy to deploy it to a third-party provider.

Design and architecture

The status site is part of the static mirror system and is built with cstate, which is a theme for the Hugo static site generator. The site is managed in a git repository on the GitLab server and uses GitLab CI to get built. The static-shim service propagates the builds to the static mirror system for high availability.

See the static-shim service design document for more information.

Services

No service other than the above external services are required to run this service.

Queues

There are no queues or schedulers for that service, although renovate-cron will pass by the project to check for updates once in a while.

Interfaces

Authentication

Implementation

Status is mostly written in Markdown, but the upstream code is written in Golang and its templating language.

Issues

File or search for issues in the status-site tracker.

Upstream issues can be found and filed in the GitHub issue tracker.

Users

TPA is the main maintainer of this service and therefore its most likely user, but the network health team are frequent users as well.

Naturally, any person interested in the Tor project and the health of the services is also a potential user.

Upstream

cState is a pretty collaborative and active upstream. It is seeing regular releases and is considered healthy, especially since most of the implementation is actually in hugo, another healthy project.

Monitoring and metrics

No metrics for this service are currently defined in Prometheus, outside of normal web server monitoring.

Tests

New changes to the site are manually checked by browsing a rendered version of the site and clicking around.

This can be done on a local copy before even committing, or it can be done with a review site by pushing a branch and opening a merge request.

Logs

There are no logs or metrics specific to this service, see the static site service for details.

A history of deployments and past version of the code is of course available in the Git repository history and the GitLab job logs.

Backups

Does not need special backups: backed up as part of the regular static site and git services.

Other documentation

Discussion

Overview

This project comes from two places:

  1. during the 2020 TPA user survey, some respondents suggested to document "down times of 1h or longer" and better communicate about service statuses

  2. separately, following a major outage in the Tor network due to a DDOS, the network team and network health teams asked for a dashboard to inform tor users about such problems in the future

This is therefore a project spanning multiple teams, with different stakeholders. The general idea is to have a site (say status.torproject.org) that simply shows users how things are going, in an easy to understand form.

Security and risk assessment

No security audit was performed of this service, but considering it only manages static content accessed by trusted users, its exposure is considered minimal.

It might be the target of denial of service attacks, as the rest of the static mirror system. A compromise of the GitLab infrastructure would also naturally give access to the status site.

Finally, if an outage affects the main domain name (torproject.org) this site could suffer as well.

Technical debt and next steps

The service should probably be moved onto an entirely different domain, managed on a different registrar, using keys stored in a different password manager.

There used to be no upgrades performed on the site, but that was fixed in November 2023, during the Hackweek.

Goals

In general, the goal is to provide a simple interface to provide users with status updates.

Must have

  • user-friendly: the public website must be easy to understand by the Tor wider community of users (not just TPI/TPA)
  • status updates and progress: "post status problem we know about so the world can learn if problems are known to the Tor team."
    • example: "[recent] v3 outage where we could have put out a small FAQ right away (go static HTML!) and then update the world as we figure out the problem but also expected return to normal."
  • multi-stakeholder: "easily editable by many of us namely likely the network health team and we could also have the network team to help out"
  • simple to deploy and use: pushing an update shouldn't require complex software or procedures. editing a text file, committing and pushing, or building with a single command and pushing the HTML, for example, is simple enough. installing a MySQL database and PHP server, for example, is not simple enough.
  • keep it simple
  • free-software based

Nice to have

  • deployment through GitLab (pages?), with contingency plans
  • separate TLD to thwart DNS-based attacks against torproject.org
  • same tool for multiple teams
  • per-team filtering
  • RSS feeds
  • integration with social media?
  • responsive design

Non-Goals

  • automation: updating the site is a manual process. no automatic reports of sensors/metrics or Nagios, as this tends to complicate the implementation and cause false positives

Approvals required

TPA, network team, network health team.

Proposed Solution

We're experimenting with cstate because it's the only static website generator with such a nice template out of the box that we could find.

Cost

Just research and development time. Hosting costs are negligible.

Alternatives considered

Those are the status dashboards we know about and that are still somewhat in active development:

Abandonware

Those were previously evaluated in a previous life but ended up being abandoned upstream:

  • Overseer - used at Disqus.com, Python/Django, user-friendly/simple, administrator non-friendly, twitter integration, Apache2 license, development stopped, Disqus replaced it with Statuspage.io
  • Stashboard - used at Twilio, MIT license, demo, Twitter integration, REST API, abandon-ware, no authentication, no Unicode support, depends on Google App engine, requires daily updates
  • Baobab - previously used at Gandi, replaced with statuspage.io, Django based

Hacks

Those were discarded because they do not provide an "out of the box" experience:

  • use Jenkins to run jobs that check a bunch of things and report a user-friendly status?
  • just use a social network account (e.g. Twitter)
  • "just use the wiki"
  • use Drupal ("there's a module for that")
  • roll our own with Lektor, e.g. using this template
  • using GitHub issues

example sites

Previous implementations

IRC bot

A similar service was ran by @weasel around 2014. It would bridge the status comments on IRC into a website, see this archived version and the source code, which is still available.

Jenkins jobs

The site used to be built with Jenkins jobs, from a git repository on the git server. This was setup this way because that is how every other static website was built back then.

This involved:

We also considered using GitLab CI for deployment but (a) GitLab pages was not yet setup and (b) it didn't integrate well with the static mirror system for now. See the broader discussion of the static site system improvements.

Both issues have now been fixed thanks to the static-shim service.