Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Proposal

The proposal is for TPA/web to develop and maintain a new lektor translation plugin tentatively with the placeholder name of "new-translation-plugin". This new plugin will replace the current lektor-i18n-plugin

Background

A note about terminology: This proposal will refer to a lektor plugin currently used by TPA named "lektor-i18n-plugin", as well as a proposed new plugin. Due to the potential confusion between these names, the currently-in-use plugin will be referred to exclusively as "lektor-i18n-plugin", and the proposed new plugin will be referred to exclusively as "new-translation-plugin", though this name is not final.

The tpo/web repos use the lektor-i18n-plugin to provide gettext-style translation for both html templates and contents.lr files. Translation is vital to our sites, and lektor-i18n-plugin seems to be the only plugin providing translation (if others exist, I haven't found them). lektor-i18n-plugin is also the source of a lot of trouble for web and TPA:

  • Multiple builds are required for the plugin to work
  • Python versions > 3.8.x make the plugin produce garbled POT files. For context, the current Python version at time of writing is 3.10.2, and 3.8.x is only receiving security updates.

Several attempts have been made to fix these pain points:

  • Multiple builds: tpo/web/lego#30 shows an attempt to refactor the plugin to provide an easily-usable interface for scripts. It's had work on and off for the past 6 months, with no real progress being made.
  • Garbled POT files: tpo/web/team#21 details the bug, where it occurs, and a workaround. The workaround only prevents bad translations from ending up in the site content, it doesn't fix the underlying issue of bad POT files being created. This fix hasn't been patched or upstreamed yet, so the web team is stuck on python 3.8.

Making fixes like these is hard. The lektor-i18n-plugin is one massive file, and tracing the logic and control flow is difficult. In the case of tpo/web/lego#30, the attempts at refactoring the plugin were abandoned because of the massive amount of work needed to debug small issues. lektor-i18n-plugin also seems relatively unmaintained, with only a handful of commits in the past two and a half years, many made by tor contributors.

After attempting to workaround and fix some of the issues with the plugin, I've come to the conclusion that starting from scratch would be easier than trying to maintain lektor-i18n-plugin. lektor-i18n-plugin is fairly large and complex, but I don't think it needs to be. Using Lektor's VirtualSourceObject class should completely eliminate the need for multiple builds without any additional work, and using PyBabel directly (instead of popening gettext) will give us a more flexible interface, allowing for out-of-the-box support for things like translator comments and ignoring html tags that lektor-i18n-plugin seemingly doesn't support.

Using code and/or ideas from lektor-i18n-plugin will help ease the development of a new-translation-plugin. Many of the concepts behind lektor-i18n-plugin (marking contents.lr fields as translatable, databag translation, etc.) are sound, and already implemented. Even if none of the code is reused, there's already a reference for those concepts.

By using PyBabel, VirtualSourceObject, and referencing lektor-i18n-plugin, new-translation-plugin's development and maintenance should be far easier than continuing to work around or fix lektor-i18n-plugin.

Alternatives Considered

During the draft phase of this RFC, several alternatives were brought up and considered. Here's the conclusion I came to for each of them:

Fix the existing plugin ourselves

Unfortunately, fixing the original plugin ourselves would take a large amount of time and effort. I've spent months on-and-off trying to refactor the existing plugin enough to let us do what we need to with it. The current plugin has no tests or documentation, so patching it means spending time getting familiar with the code, changing something, running it to see if it breaks, and finally trying to figure out what went wrong without any information about what happened. We would have to start almost from scratch any way, so starting with the existing plugin would mostly just eat more time and energy.

Paying the original/external developers to fix our issues with the plugin

This solution would at least free up a tpa member during the entire development process, but it still comes with a lot of the issues of fixing the plugin ourselves. The problem I'm most concerned with is that at the end of the new plugin's development, we won't have anyone familiar with it. If something breaks in the future, we're back in the same place we are now. Building the new plugin in-house means that at least one of us knows how the plugin works at a fundamental level, and we can take care of any problems that might arise.

Replacing lektor entirely

The most extreme solution to our current problems is to drop lektor entirely, and look into a different static site generator. I've looked into some popular alternative SSGs, and haven't found any that match our needs. Most of them have their own translation system that doesn't use GNU gettext translations. We currently do our translations with transifex, and are considering weblate; both of those sites use gettext translation templates "under-the-hood" meaning that if an SSG doesn't have a gettext translation plugin, we'd have to write one or vastly change how we do our translations. So even if porting the site to a different SSG was less work than developing a new lektor plugin, we'd still need to write a new plugin for the new SSG, or change how we do translations.

  • Jekyll:
    • jekyll-multiple-languages-plugin seems to be the most-used plugin based on github stars. It doesn't support gettext translations, making it incompatible with our current workflow.
    • I spent about 1.5 to 2 hours trying to "port" the torproject.org homepage to Jekyll. Jekyll's templating system (liquid) works very differently than Lektor's templating system (Jinja 2). I gave up trying to port it when I realized that a simple 1:1 translation of the templates wouldn't be possible, and the way our templates work would need to be re-thought from the ground up to work in Liquid. Keep in mind that I spent multiple hours trying to port a single page, and was unable to do it.
  • Pelican:
    • Built-in translation, no support for gettext translation. See above why we need gettext.
  • Hexo:
    • Built-in translation, no support for gettext translation.
  • Hugo:
    • Built-in translation, no support for gettext translation.

Given the amount of work that would need to go into changing the SSG (not to mention changing the translation system), I don't think replacing Lektor is feasible. With the SSGs listed we would need to either re-do our translation setup or write a new plugin (both of which would take as much effort as a new lektor translation plugin), and we'd also need to spend enormous amount of time porting our existing content to the new SSG. I wasn't able to work in the SSGs listed enough to be able to give a proper estimate, but I think it's safe to say that moving our content to a new SSG would be more effort than a new plugin.

Plugin Design

The planned outline of the plugin looks something like this

  1. The user clones a web repo, initializes submodules, and clones the correct translation.git branch into the /i18n folder (path relative to the repo root), and installs all necessary dependencies to build the lektor site
  2. The user runs lektor build from the repo root
  3. Lektor emits the setup-env event, which is hooked by new-translation-plugin to add the _ function to templates
  4. Lektor emits the before-build-all event, which is hooked by new-translation-plugin
  5. new-translation-plugin regenerates the translation POT file
  6. new-translation-plugin updates the PO files with the newly-regenerated POT file
  7. new-translation-plugin generates a new TranslationSource virtual page for each page's translations, then adds the pages to the build queue

Impact on Related Roadmaps/OKRs

The development of a new plugin could take quite a while. As a rough estimate, it could take at least a month as a minimum for the plugin to be completed, assuming everything goes well. Taking time away from our OKRs to work exclusively on this plugin could setback our OKR timelines by a lot. On the other hand, if we're able to complete the plugin quickly we can streamline some of our web objectives by removing issues with the current plugin.

This plugin would also greatly reduce the build time of lektor sites, since they wouldn't need to be built three times. This would make the web "OKR: make it easier for translators to contribute" about 90% complete.