Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#2317 closed change (fixed)

[adblockplus.org Anwiki to CMS migration] Write script to compare page sources for the live / beta sites

Reported by: kzar Assignee: trev
Priority: P3 Milestone:
Module: Unknown Keywords: 2015q2
Cc: trev Blocked By:
Blocking: #2035 Platform: Unknown
Ready: yes Confidential: no
Tester: Verified working: no
Review URL(s):

Description (last modified by trev)

Background

We are migrating our website adblockplus.org away from Anwiki to our own CMS. As an aid to the QA team we want to write a script to compare the source code generated for every page to the existing website.

What to change

Write a script to do that. Some notes:

  • The script should list differences in the code for each page, excluding the base templates.
  • The script should compare the static files generated by the CMS with current adblockplus.org content.
  • The script should output some useful diff format that is hopefully easy to look through to spot problems.

Change History (5)

comment:1 Changed 5 years ago by trev

  • Description modified (diff)
  • Priority changed from Unknown to P3
  • Ready set

comment:2 Changed 5 years ago by trev

  • Owner set to trev

comment:3 Changed 5 years ago by trev

I created a repository for this script under https://github.com/palant/adblockplus-website-comparison. Current version will merely compare the list of files, this already found some broken/outdated content. With these issues resolved the only remaining difference is he/contribute - this translation is barely below 30% boundary in Anwiki (27%) but above it in the new CMS (32%). That's because only the number of translated strings matters and not the amount of translated text, and Anwiki had some sentences split up into multiple strings.

comment:4 Changed 5 years ago by fhd

  • Keywords 2015q2 added

comment:5 Changed 5 years ago by trev

  • Resolution set to fixed
  • Status changed from new to closed

At this point the script produces a diff that is "merely" 112 kB large. This is due to the following issues:

  • We have lots of partially translated sentences in Anwiki, simply because in Anwiki the translation unit often wasn't a sentence but rather contents of a tag. All these partial translations will are reset to 100% English when imported into the CMS which is a good thing.
  • Contents of a <pre> tag aren't considered translatable by the conversion script and will be reset to English. This affects a two examples, on the filters page and another on the faq_internal page. It is hard to convert these to the CMS markup due to HTML tags used there. Arguably, that's not worth fixing - I think that both examples shouldn't have been translatable in the first place.
  • On some occasions the translators left inline tags empty, so instead of Please click <a>here</a> they have Please click here <a></a>. Anwiki will render these tags empty whereas the CMS will have a space inside. This difference in rendering is irrelevant - it's a broken translation that needs to be fixed.
  • On some occasions added or removed trailing/leading whitespace where it actually mattered. In the CMS this whitespace is no longer part of the translation, meaning that it is identical across all languages even if some translations are messed up in Anwiki. I've been fixing these issues in Anwiki but quite frankly it's a waste of time.
  • For Chinese, the whitespace actually should be different: an ideographic full stop isn't followed up by a space, an additional separate isn't necessary. This is something we currently cannot have correctly in the CMS, so the comparison script will ignore this issue.

It seems that there is no way to get the output smaller, so I'm resolving this issue.

Last edited 5 years ago by trev (previous) (diff)
Note: See TracTickets for help on using tickets.