Opened on 04/15/2015 at 09:34:50 AM
Closed on 05/06/2015 at 07:57:50 PM
Last modified on 05/06/2015 at 08:19:14 PM
#2317 closed change (fixed)
[adblockplus.org Anwiki to CMS migration] Write script to compare page sources for the live / beta sites
Reported by: | kzar | Assignee: | trev |
---|---|---|---|
Priority: | P3 | Milestone: | |
Module: | Unknown | Keywords: | 2015q2 |
Cc: | trev | Blocked By: | |
Blocking: | #2035 | Platform: | Unknown |
Ready: | yes | Confidential: | no |
Tester: | Verified working: | no | |
Review URL(s): |
Description (last modified by trev)
Background
We are migrating our website adblockplus.org away from Anwiki to our own CMS. As an aid to the QA team we want to write a script to compare the source code generated for every page to the existing website.
What to change
Write a script to do that. Some notes:
- The script should list differences in the code for each page, excluding the base templates.
- The script should compare the static files generated by the CMS with current adblockplus.org content.
- The script should output some useful diff format that is hopefully easy to look through to spot problems.
Attachments (0)
Change History (5)
comment:1 Changed on 04/15/2015 at 10:54:06 AM by trev
- Description modified (diff)
- Priority changed from Unknown to P3
- Ready set
comment:2 Changed on 04/17/2015 at 05:24:45 PM by trev
- Owner set to trev
comment:3 Changed on 04/17/2015 at 08:54:51 PM by trev
comment:4 Changed on 04/21/2015 at 09:12:15 AM by fhd
- Keywords 2015q2 added
comment:5 Changed on 05/06/2015 at 07:57:50 PM by trev
- Resolution set to fixed
- Status changed from new to closed
At this point the script produces a diff that is "merely" 112 kB large. This is due to the following issues:
- We have lots of partially translated sentences in Anwiki, simply because in Anwiki the translation unit often wasn't a sentence but rather contents of a tag. All these partial translations will are reset to 100% English when imported into the CMS which is a good thing.
- Contents of a <pre> tag aren't considered translatable by the conversion script and will be reset to English. This affects a two examples, on the filters page and another on the faq_internal page. It is hard to convert these to the CMS markup due to HTML tags used there. Arguably, that's not worth fixing - I think that both examples shouldn't have been translatable in the first place.
- On some occasions the translators left inline tags empty, so instead of Please click <a>here</a> they have Please click here <a></a>. Anwiki will render these tags empty whereas the CMS will have a space inside. This difference in rendering is irrelevant - it's a broken translation that needs to be fixed.
- On some occasions added or removed trailing/leading whitespace where it actually mattered. In the CMS this whitespace is no longer part of the translation, meaning that it is identical across all languages even if some translations are messed up in Anwiki. I've been fixing these issues in Anwiki but quite frankly it's a waste of time.
- For Chinese, the whitespace actually should be different: an ideographic full stop isn't followed up by a space, an additional separate isn't necessary. This is something we currently cannot have correctly in the CMS, so the comparison script will ignore this issue.
It seems that there is no way to get the output smaller, so I'm resolving this issue.
Last edited on 05/06/2015 at 08:19:14 PM
by trev
Note: See
TracTickets for help on using
tickets.
I created a repository for this script under https://github.com/palant/adblockplus-website-comparison. Current version will merely compare the list of files, this already found some broken/outdated content. With these issues resolved the only remaining difference is he/contribute - this translation is barely below 30% boundary in Anwiki (27%) but above it in the new CMS (32%). That's because only the number of translated strings matters and not the amount of translated text, and Anwiki had some sentences split up into multiple strings.