Opened 4 years ago

Closed 4 years ago

#2625 closed change (fixed)

[cms] Add Crowdin synchronisation script

Reported by: kzar Assignee: kzar
Priority: P2 Milestone:
Module: Sitescripts Keywords:
Cc: fhd, trev, sebastian, saroyanm Blocked By:
Blocking: #2035 Platform: Unknown
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29317015/

Description (last modified by kzar)

Background

Our CMS used to power several of our websites has built in translation support. A default "master" language is chosen, strings are written inline inside pages in that language. Translations for other languages are stored in JSON files using the same structure as Crowdin and Chrome extensions use for translations.

Updating these translation files manually is a lot of work so we are going to use the online translation service called Crowdin instead. To do this we need an automated way to synchronise page strings and translations with Crowdin. We need to be able to extract and upload the master strings from all the pages, upload any existing translations, and download any new translations from Crowdin. (More things need to happen in the background to facilitate these functions.)

What to change

Add a Python script cms.bin.translate that takes the source repository path and a Crowin API key as parameters.

For each page the script should combine inline strings and comments with any strings specified for the default locale, generate a JSON file and upload it to Crowdin.

For pages that already exist in the Crowdin project we need to use the Crowdin API's update file endpoint.

For pages that don't already exist in the Crowdin project we will need to use the add file endpoint instead. For these pages we will also need to upload any pre-existing translations that we have locally to Crowdin.

(Before updating or adding any files we need to ensure the required directories all exist in the Crowdin project, using the API. We will also need to ensure all required locales are enabled for the project.)

After all page strings have been extracted and uploaded the script needs to request a fresh export archive of the translation files to be generated. Once that has been completed the archive needs to be downloaded and extracted locally, updating the translations, bringing everything in-sync with the Crowdin project.

(Any old pages or directories that exist in the Crowdin project but not locally should be removed, along with any strings and translations. Any old locale files for removed pages should be cleared locally during the synchronisation process as well.)

Notes:

  • We already have comparable functionality in buildtools.
  • The JSON files generated should of course be in the same format as all the other locale files.
  • The default locale is specified by the settings.ini file in the repository path, inside the general section. (There's likely already code written to check the defaultlocale.)
  • Crowdin project name should be added to settings.ini file.
  • Consider dynamically generated pages, such as subscription list and interface pages. Ensure that any dynamically generated, translatable strings are included in the generated JSON files.
  • The cms.utils.get_page_params functionality should be re-used to extract the page strings. We do not want to re-implement this logic.

Change History (10)

comment:1 Changed 4 years ago by trev

  • Cc trev added
  • Description modified (diff)
  • Summary changed from [cms] Add a way to export English locale files ready for Crowdin import to [cms] Add a way to upload master translation to Crowdin

comment:2 Changed 4 years ago by sebastian

Note that jinja2 already comes with functionality to extract source translations from templates:
http://jinja.pocoo.org/docs/dev/extensions/#jinja2.Environment.extract_translations

While we don't use jinja2's the i18n extension, you might still want to reuse that code if possible. Or at least it should give you an idea how strings can be extracted from templates.

comment:3 Changed 4 years ago by sebastian

  • Cc sebastian added
  • Priority changed from Unknown to P2
  • Ready set

comment:4 Changed 4 years ago by kzar

Thanks, I already have it more-or-less working. I'm just resolving some issues with the upload and tidying the code for review. I didn't find extracting the strings too bad, we had solved similar problems in the CMS and conversion script so I had somewhere to start from.

Last edited 4 years ago by kzar (previous) (diff)

comment:5 Changed 4 years ago by trev

Jinja2 functionality is meant for translations that aren't marked up. In our case they are marked up, we shouldn't be guessing.

Note that we should not have yet another layer of code extracting strings (and resolving includes and whatever else). We should instead run the usual processing for the pages (via utils.get_page_params()) and make sure that Coverter.localize_string() saves the master translation so that we can use it.

comment:6 Changed 4 years ago by sebastian

Note that this won't consider translations in branches that aren't executed like:

{% if might_be_false %}
  {{ "Hello World"|translate("s1") }}
{% endif %}

But as I just discussed with trev, that should be still good enough for us, as long as we make sure to generate all pages while extracting translations.

comment:7 Changed 4 years ago by kzar

  • Description modified (diff)
  • Summary changed from [cms] Add a way to upload master translation to Crowdin to [cms] Add Crowdin synchronisation script

comment:8 Changed 4 years ago by kzar

  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

comment:9 Changed 4 years ago by saroyanm

  • Cc saroyanm added

comment:10 Changed 4 years ago by kzar

  • Resolution set to fixed
  • Status changed from reviewing to closed
  • Tester set to Unknown
Note: See TracTickets for help on using tickets.