Opened 5 years ago

Closed 4 years ago

#1170 closed change (fixed)

[adblockplus.org Anwiki to CMS migration] Migrate content

Reported by: trev Assignee: kzar
Priority: P3 Milestone:
Module: Websites Keywords:
Cc: manvel@…, greiner, kzar Blocked By:
Blocking: #2035 Platform: Unknown
Ready: yes Confidential: no
Tester: Verified working: no
Review URL(s):

http://codereview.adblockplus.org/5636796054503424/

Description

Background

We want to replace Anwiki by our own CMS which is implemented in the sitescripts repository. The content needs to be converted to the new format, at this point we can use raw HTML rather than converting it to Markdown.

What to change

We don't need to access Anwiki directly, the content mirror in the www repository can be used instead. The converted content should be added to wwwnew repository (probably going to be renamed into web.adblockplus.org later).

Attachments (1)

convert.py (11.2 KB) - added by trev 5 years ago.
Conversion script

Download all attachments as: .zip

Change History (22)

Changed 5 years ago by trev

Conversion script

comment:1 Changed 5 years ago by trev

I attached the conversion script I have so far. I seem to remember that it didn't merge translatable strings in an optimal way but looking a the generated results I could only find an issue in getting_started_installation.json:

  "s10": {
    "message": "There is a number of other problems that can occur during extension installation. Please have a look at the"
  },
  "s11": {
    "message": "http://kb.mozillazine.org/Unable_to_install_themes_or_extensions_-_Firefox"
  },
  "s12": {
    "message": "MozillaZine Knowledge Base"
  },
  "s13": {
    "message": "for solutions."
  }

This sentence has been split up into four strings, there should really be only two:

  "s10": {
    "message": "There is a number of other problems that can occur during extension installation. Please have a look at the <a>MozillaZine Knowledge Base</a> for solutions."
  },
  "s11": {
    "message": "http://kb.mozillazine.org/Unable_to_install_themes_or_extensions_-_Firefox"
  }

The other problem is that this script will only consider that default page type. We also have custom interface and preftable types in Anwiki, the pages using these will have to be replaced by Jinja2 templates that are fed some data. And there is also the subscriptions page using type subscriptionlist - here we definitely won't get around manual conversion (the main part of that page is already being pre-generated via a template, Anwiki only adds the localizable parts).

comment:2 Changed 5 years ago by kzar

  • Owner set to kzar

comment:3 Changed 5 years ago by kzar

I've been working on the conversion script under this repo https://bitbucket.org/adblockplus/website-converter and I've otherwise been making manual changes to my local copy of wwwnew. Going away tomorrow for a few days but will submit a code review request when it's "done".

Making good progress anyway, I have fixed the issue you mentioned with getting_started_installation.json and several other bits and bobs.

  • Dave.

comment:4 Changed 5 years ago by kzar

@trev I've been looking at the interface type, not sure the best way to go about it. I've created a template no problem but what's the best way to feed data through into it?

Variables assigned at the top of the page can be accessed by the template but they are limited to one line strings. This makes them useless unless I serialise the interface data as JSON or similar. (Passing through the interface details like this would mean we couldn't offer translations as well.) I have managed to do this by creating JSON encode / decode filters but it seems like a messy approach.

Another option was to create each interface page as a jinja2 .tmpl itself and assign the interface data in the page using the set tag {% set interface = [...] %}. Problem with that is that as we are using our own method of template inheritance instead of jinja's {% extend %} tag the page is rendered and then passed through to the interface template as the body string - no chance for sharing assigned variables. (I tried also using the extend tag but that resulted in an exception.)

Same problem came with using a jinja macro defined in interface template from the page, because template and page are rendered separately we don't have access to the template's macros from the page.

Another idea I had was to define the interface details in the translation strings JSON file for the page. Problem is that even though the file is JSON if you're using the standard translation functions the structure is quite limited. (ie key -> "message" -> string). With this limitation I managed to store most of the interface data under keys like "property0name" "property0type" but it gets ugly fast. Also as jinja2 only has for loops out of the box I had to store the property and method count in variables at the top of the page. Then with each method having a variable number of arguments I would have to store each method's argument count somewhere, and we'd end up with string names like "method3argument2name"... not ideal!

Finally I figured out that you can access an arbitrary data using the {{ localedata }} template variable if it's assigned to the message key. We could store our interface data like so: { "interface": {"message": {"properties": [...], "methods": [...]}}} in the JSON. It seems like an awful hack though and I'm no idea what it would do to the translation logic elsewhere. (What if we want to create a German version of the page?!) Edit: After more experimentation I realised the data inside the message property although parsed as JSON is squashed back down to a string before it makes it to the template.

What do you think?

Last edited 5 years ago by kzar (previous) (diff)

comment:5 Changed 5 years ago by kzar

OK So I've gotten it working by storing each property and method in loads of separate translation strings. The page renders properly and translations work as before. It's not ideal though for these reasons:

  • There are absolutely loads of strings like "method1argument0description" "property4name" etc, it's kind of a mess.
  • In the template there's no way to know how many property, method or method arguments there are so I've had to just iterate from 0 to 100 and each time only displaying the content if the property / method / method argument exists! (There's no continue / break functionality in jinja2 by default. There is an extension we could add to the CMS though.)
  • The description fields vary in structure so I've had to store the HTML for them along with the strings in the JSON. I then unescape the HTML when rendering using a new filter.

If you're OK with all that then I guess we're done on the interfaces, if not I think the CMS needs to be improved somehow to facilitate this kind of situation. (Unless you can think of another way to do this that I haven't thought of?)

Thanks, Dave.

comment:6 Changed 5 years ago by kzar

(Same problem but possibly worse for the preference table page, we're going to have loads of translation strings like "section4preference22description", "section3preference4name" etc.)

Last edited 5 years ago by kzar (previous) (diff)

comment:7 Changed 5 years ago by kzar

Last edited 5 years ago by kzar (previous) (diff)

comment:8 Changed 5 years ago by kzar

OK I've been making some progress:

  • All the pages that I can see look the same with wwwnew as they do with anwiki
  • The subscription, interface and preftable pages work including with translations.
  • The strings are stored in a more optimal way, I've managed to avoid them being split up so much.
  • I've removed some files that are no longer required, probably more to go though (it's tricky to spot them all.)
  • More that I forget...

I've attempted to submit a codereview patchset here http://codereview.adblockplus.org/6021528219025408/ . Unfortunately the tool failed rather as the patchset is so large. The server returned "500 internal server error" quite a few times.

Should I work on some Javascript to detect the right locale / browser? Anything else?

  • Dave
Last edited 5 years ago by kzar (previous) (diff)

comment:9 Changed 5 years ago by trev

Dave, sorry about the delay. I think you should really only submit convert.py for review, plus the templates you created - we can generate its output ourselves.

Concerning interface pages: yes, numbered properties are a huge and unmaintainable hack. Instead, the interface pages should be templates, something like this:

{%
  set members = {
    "readonly PRInt32 foo": "$fooDescription$"
    "IBar getBar(AString id)": "<p>$barDescription1$</p>\n\n<p>$barDescription2$</p>",
  }
%}
{% include 'interface' %}

The template should be able to recognize properties and methods automatically here, as well as distinguishing types, names and modifiers (by using a custom filter if it has to be). Note that none of this should be localized - only the descriptions. Note also that markup like paragraphs should not be part of the localization either. So we probably want description templates to be defined along with the properties/methods as well.

It's the same thing with preferences. Subscriptions on the other hand are a special case. The HTML file you included there is currently being generated via the sitescripts/subscriptions/template/subscriptionList.html template - it's a hack for the sake of Anwiki. We should use that template (in a slightly modified form to include localized strings) directly. The data that normally goes into this template can be serialized into pages/prefs.tmpl for now, later it should come directly from the [subscriptionlist repository](https://hg.adblockplus.org/subscriptionlist/).

comment:10 Changed 5 years ago by kzar

No problem! Thanks, the approach you mentioned for getting data through into the interface template works and is much much cleaner :) (The only part that doesn't work is the automatic translations of strings like $fooDescription$, these are not replaced automatically for .tmpl files like they are for .raw files apparently. No big deal though as we have the translate filter.)

comment:11 Changed 5 years ago by kzar

@palant OK I've got another code review ready, it only contains the manually edited files and the interface etc pages have been implemented in a much better way. http://codereview.adblockplus.org/5636796054503424/

comment:12 Changed 5 years ago by trev

  • Component changed from Infrastructure to Websites

comment:13 Changed 5 years ago by saroyanm

  • Cc manvel@… added

comment:14 Changed 5 years ago by kzar

  • Review URL(s) modified (diff)

comment:15 Changed 5 years ago by greiner

  • Cc greiner added

comment:16 Changed 5 years ago by kzar

  • Owner changed from kzar to trev

So current status to make handover easier @palant:

  • The conversion script is in GitHub and I've made sure to push everything now. It's kind of a mess but does a pretty good job, IIRC all the content is brought across even interfaces and stuff.
  • There's a code review waiting for the manual changes I've made to the wwwnew repo. In combination with the script it brings the content pretty much up to date with the live site. Browsing around I couldn't find any differences. (If you want some extra context about my changes check out these commits in GitHub.)
  • The code to detect which browser the user uses definitely still hasn't been written, other than that I'm not sure what else needs to be done other than obviously lots of testing!

Hope that helps, good luck!

comment:17 Changed 5 years ago by kzar

  • Cc kzar added

comment:18 Changed 5 years ago by kzar

  • Owner changed from trev to kzar
  • Summary changed from Migrate adblockplus.org content from Anwiki to our CMS to [adblockplus.org Anwiki to CMS migration] Migrate content

comment:19 Changed 5 years ago by kzar

  • Blocking 2035 added

comment:20 Changed 5 years ago by kzar

  • Status changed from new to reviewing

comment:21 Changed 4 years ago by trev

  • Resolution set to fixed
  • Status changed from reviewing to closed

Should be fixed now:

There is still a number of problems, follow-up issues blocking #2035 should be filed for these.

Note: See TracTickets for help on using tickets.