Opened 4 years ago

Closed 4 years ago

#3049 closed defect (fixed)

Canonical URLs don't fallback to some browser languages on adblockbrowser.org

Reported by: vickyyu Assignee: saroyanm
Priority: P2 Milestone:
Module: Websites Keywords:
Cc: vickyyu, trev, fhd, matze, saroyanm, sebastian, kzar Blocked By: #3076
Blocking: Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29327712/

Description (last modified by saroyanm)

Environment

Chrome 46 (Other browsers are probably affected as well.)
with any other language supported by adblockbrowser.org that have a region part in the language code, such as zh-CN

How to reproduce

Perform a GET request to https://adblockbrowser.org , sending the header Accept-Language: zh-CN

Observed behaviour

The page is in English.

Expected behaviour

The page should be in Chinese. (Perform a similar request to https://adblockplus.org for an example.)

Background

So far our web repositories delimit the language region part with an underscore _ instead of a hyphen -. Our Nginx configuration(s) expect this, and so don't function properly when the hyphen delimiter is used.

The problem is that the language names must exactly match what Crowdin expects for the translation synchronisation process to work and Crowdin insists on the hyphen delimiter. To become compatible with Crowdin we renamed the locale directories, causing this regression.

What to change

Rename the locale directories of web.adblockbrowser.org back to match web.adblockplus.org (basically revert d19bfbe47ed8, ensuring that the locale files creating since then are also moved).

This should fix this regression. In the mean time @Dave will work on #3076 adding the locale name mapping functionality to the CMS. (If translations need to be synchronised before the mapping functionality is added the locale directories will need to be manually renamed before and after the synchronisation.)

Change History (46)

comment:1 Changed 4 years ago by saroyanm

  • Priority changed from Unknown to P2

comment:2 Changed 4 years ago by saroyanm

  • Ready set

comment:3 Changed 4 years ago by saroyanm

  • Blocked By 3058 added

comment:4 Changed 4 years ago by kzar

I don't think this ticket is ready, saroyanm you mentioned in #3058 that this issue is somehow related to the underscores in locale names. Please could you update this issue add some more information about the problem?

Last edited 4 years ago by kzar (previous) (diff)

comment:5 Changed 4 years ago by saroyanm

  • Description modified (diff)
  • Ready unset
  • Type changed from defect to change

It was "defect" ticket and describing the reproduction, but you are right, make sense to change into change ticket, also noticed another problem with another locale es-ES and es.

comment:6 follow-up: Changed 4 years ago by kzar

The ticket was better before, this _is_ a defect and we _do_ need steps for reproduction.

My problem was that in #3058 you gave this issue as the justification as to why we need to replace underscores with hyphens in locale names in the Crowdin synchronisation script. But this issue actually contained zero information about why that solution would help matters or what was actually causing the problem in the first place.

Both issues are now lacking in my opinion.

  • What is the problem. (The redirection for Chinese users isn't working from Adblock Browser? Which platform?)
  • How can we reproduce the problem? (Version numbers, exact URLs, precise steps etc.)
  • What is causing the problem. (Is the server properly sending the request for redirection? Is the client acting properly? Why are underscores triggering this problem?)
  • What is the correct solution? Not what's a quick fix that works for now. (Should we fix the server so that it sends redirections properly when there's a underscore in the URL? Should we correct an issue with our browser on a certain platform? etc etc...)

I don't want to rush into making a change like "Quick replace all the underscores" without understand what's actually happening.

Last edited 4 years ago by kzar (previous) (diff)

comment:7 in reply to: ↑ 6 Changed 4 years ago by saroyanm

  • Cc trev fhd added; saroyanm removed

Replying to kzar:

  • What is the problem. (The redirection for Chinese users isn't working from Adblock Browser? Which platform?)

No it's about default redirection, as you can see we are providing users the translation according to the Browser language default setup, in other words checking for "Accept-Language" headers.

  • How can we reproduce the problem? (Version numbers, exact URLs, precise steps etc.)

Change the browser default language setting to Chinese and navigate to adblockbrowser.org (I can return the old description of the ticket if it more descriptive for you, currently I'm referring to server side detection of "Accept-Language").

  • What is causing the problem. (Is the server properly sending the request for redirection? Is the client acting properly? Why are underscores triggering this problem?)

The problem is caused because I've changed the [langnames] in settings.ini to match the crowdin locales, but initially in the website requirements were mentioned that we need to use same [langnames] as in the adblockplus.org, I can't show the diff while the web.adblockbrowser.org is not yet visible for public, but you can check "changeset: 16:d19bfbe47ed8".

  • What is the correct solution? Not what's a quick fix that works for now. (Should we fix the server so that it sends redirections properly when there's a underscore in the URL? Should we correct an issue with our browser on a certain platform? etc etc...)

I think the consistent solution would be, that we can assign also crowdin locales to the [langnames], maybe similar syntax:
[langnames]
zh-CN = 中文(简体) = zh_CN ?
@fhd maybe you can have an opinion regarding this as well ?

I don't want to rush into making a change like "Quick replace all the underscores" without understand what's actually happening.

Please no rush. This problem is regression because of rush.

comment:8 Changed 4 years ago by kzar

Well if you don't want to rush into replacing underscores with hyphens before the problem is properly understood - or at least documented - why has issue #3058 already been filed to do just that?

The bullet points in my previous comment were supposed to illustrate the kind of things that needed to go into the issue's description. Thanks for your replies to my points but I'm still not clear about what is actually causing the issue and I still think the descriptions for this issue and #3058 are lacking.

comment:9 Changed 4 years ago by sebastian

  • Cc matze saroyanm sebastian added
  • Component changed from Websites to Infrastructure

Both, the name of the generated directories as well as the Accept-Language header use dashes, not underscores. So this hasn't anything to do with that. It doesn't even has anything to with the CMS or web page contents at all. It's also not limited to Chinese.

Also note that it works fine for all languages on adblockplus.org. However, there are simply no rewrite rules configured for adblockbrowser.org to fallback to the Accept-Language header if the language prefix is missing in the URL.

comment:10 Changed 4 years ago by sebastian

  • Description modified (diff)
  • Summary changed from The language on adblockbrowser.org is not set to Chinese when Chinese users browse to Canonical URLs don't fallback to browser language on adblockbrowser.org
  • Type changed from change to defect

comment:11 Changed 4 years ago by sebastian

  • Description modified (diff)

comment:12 follow-up: Changed 4 years ago by saroyanm

For German, works fine with me, I've noticed that affects:
Portuguese, Spanish and Chinese, but works fine for German I'd say.

comment:13 Changed 4 years ago by sebastian

  • Description modified (diff)

comment:14 in reply to: ↑ 12 ; follow-up: Changed 4 years ago by sebastian

Not for me. And honestly, I doubt that it works for you with German.

comment:15 in reply to: ↑ 14 Changed 4 years ago by saroyanm

Replying to sebastian:

Not for me. And honestly, I doubt that it works for you with German.

Tested again using Firefox 40.0.3, step for reproduction:

  • Navigate to about:preferences#content
  • Open language selection dialog from Languages section.
  • Remove all languages.
  • Add German [de]
  • Restart browser
  • clear cache
  • navigate to https://adblockbrowser.org

comment:16 Changed 4 years ago by matze

German works for me as well, in both production and development.

comment:17 Changed 4 years ago by matze

Yep, any 2wo-letter-locale seems to work flawlessly.

comment:18 follow-up: Changed 4 years ago by matze

In web.adblockplus.org the specific locales in locale/ are using underscores, while web.adblockbrowser.org uses dashes. I could update the Nginx config to understand both, but I'd actually prefer a single point of failure (read: a convention with one way only).

Do you mind updating the web.adblockbrowser.org repository accordingly?

Last edited 4 years ago by matze (previous) (diff)

comment:19 Changed 4 years ago by sebastian

  • Description modified (diff)
  • Summary changed from Canonical URLs don't fallback to browser language on adblockbrowser.org to Canonical URLs don't fallback to some browser languages on adblockbrowser.org

Indeed, after clearing the cache, it also works for me, with 2-letter language codes like these of German and Italian, however not for these with a region part such as zh-CN. I updated the issue description.

comment:20 Changed 4 years ago by sebastian

I see. I currently don't have access for web.adblockbrowser.org. But feel free to go ahead.

comment:21 in reply to: ↑ 18 Changed 4 years ago by saroyanm

Replying to matze:

In web.adblockplus.org the specific locales in locale/ are using underscores, while web.adblockbrowser.org uses dashes. I could update the Nginx config to understand both, but I'd actually prefer a single point of failure (read: a convention with one way only).

Do you mind updating the web.adblockbrowser.org repository accordingly?

I only can update web.adblockbrowser.org accordingly if we can make update similar to #3058, so I can run synchronization script. Or if we can have solution in current comment -> https://issues.adblockplus.org/ticket/3058#comment:2
But you are right we need to keep "Underscore" style, because doing changes on web.adblockplus.org can affect SEO.

So the answer to your questions is:
Yes, I'll update the locales accordingly after I'll figure out with @Dave and @Sebastian how to make translation synchronization more manageable, so we will not be dependent to the Crowdin format and yes configuring Nginx to understand both would be nice while we will keep website implementation consistent.

Last edited 4 years ago by saroyanm (previous) (diff)

comment:22 Changed 4 years ago by saroyanm

All affected locales:

  • es-ES
  • zh-CN
  • zh-TW
  • pt-PT
  • pt-BR

comment:23 Changed 4 years ago by matze

Wouldn't it be easier then to just convert the web.adblockplus.org locales to use dashes as well, and update the Nginx configuration accordingly?

comment:24 Changed 4 years ago by matze

And please note that the use of underscores or dashes in the locale/ directories does NOT affect the URL whatsoever, so there shouldn't be any SEO-related implications.

comment:25 Changed 4 years ago by kzar

  • Cc kzar added

comment:26 Changed 4 years ago by sebastian

I don't see why we cannot simply update web.adblockbrowser.org to use underscores for the dirnames in the locales subfolder. The CrowdIn synchronization, beside not being used yet, was tested against web.adblockplus.org where we already use underscores.

comment:27 Changed 4 years ago by kzar

@sebastian Same for me, perhaps we're misunderstanding the issue?

comment:28 Changed 4 years ago by kzar

OK I finally do now understand. With our current Nginx configuration, for whatever reason, the Accept-Language header is not being used properly for locales with a region part when the delimiter is - instead of _.

Currently with the Crowdin synchronisation script the locale names are taken straight from the directory names inside the locales directory and Crowdin is quite picky about them. To make the synchronisation work we have to use - as the delimiter.

One option would be to add support for a locale mapping section to website settings.ini files as @saroyanm suggested. Another option would be to simply fix the Nginx configuration so language support when the region delimiter is - as @matze suggested.

Note: The web.adblockbrowser.org repository is the first to use the translation synchronisation script. When we start using it with web.adblockplus.org as well we will have the same issue there.

@matze Would it be complicated to improve the Nginx configuration to handle the hyphen delimiter as well / instead of underscores? If not I suggest we should do that.

If it's complicated or if we have other problems with the locale names Crowdin expects I think that @saroyanm's suggestion is reasonable and I'd be happy to file a ticket. (I suspect the adding the locale mapping functionality would be more complicated than improving the Nginx configuration however.)

(@saroyanm Sorry for the confusion to start with, it took me a while to understand what was going on.)

comment:30 Changed 4 years ago by sebastian

So the appropriate the solution would be to use dashes everywhere, and changing the server configuration to rely on dashes instead of underscores. IMO no reason to file a separate issue.

comment:31 Changed 4 years ago by kzar

@matze Yes, that was the comment I was referring to, you implied it wouldn't be complicated to improve the Nginx configuration but I hoped you might confirm it explicitly. Anyway, no matter.

@sebastian I agree, let's just do that.

comment:32 Changed 4 years ago by kzar

  • Description modified (diff)

comment:33 Changed 4 years ago by kzar

  • Description modified (diff)

comment:34 Changed 4 years ago by kzar

  • Description modified (diff)

comment:35 Changed 4 years ago by saroyanm

Guys sorry for late reply:

So the appropriate the solution would be to use dashes everywhere, and changing the server configuration to rely on dashes instead of underscores. IMO no reason to file a separate issue.

Please note that this will not fix the issue, but only part of it, also I'd be fan of keeping underscores for now while we were using underscores beforehand, could be a reason behind that, maybe make sense to ask @fhd if he remember if there is a reason or not. We will need to make consistent changes, but would be great to fix all the issues ex. es-ES locale.

If it's complicated or if we have other problems with the locale names Crowdin expects I think that @saroyanm's suggestion is reasonable and I'd be happy to file a ticket. (I suspect the adding the locale mapping functionality would be more complicated than improving the Nginx configuration however.)

It would probably be complicated but it will fix the issue itself which setting up ngnix to handle both dashes and underscores will not, for example there is no locale es-ES (the correct one is es) which crowdin is using, so we will need to map that somehow, probably there will be other similar cases.

Wouldn't it be easier then to just convert the web.adblockplus.org locales to use dashes as well, and update the Nginx configuration accordingly?

Handling both dashes and underscores will solve some of the issues, but not all of them, so will be probably good to have feature, also make sense to understand if there was reason for using Underscores for Adblockplus.org, before changing to dashes.

comment:36 Changed 4 years ago by kzar

I don't think Spanish will be a problem with the proposed fix, me and Sebastian have just done some testing. For example try this:

curl -s --header 'Accept-Language: es-ES' https://adblockplus.org | grep 'html lang='

comment:37 Changed 4 years ago by sebastian

The reason why this works for adblockplus.org is that the directory with the Spanish translation there is called es, and the web server as it is configured falls back to es if the Accept-Language header indicates es-ES for example. However, CrowdIn, and therefore adblockbrowser.org, calls the directory es-ES and the fallback doesn't work the other way around.

comment:38 Changed 4 years ago by kzar

As Sebastian noticed Crowdin also support adding custom locales. So we could add Spanish with the code es for the Crowdin project, instead of adding this locale mapping functionality our end. I have just tested this however and there are two problems:

1.) Crowdin's Supported languages API does not consider custom locales. So the synchronisation script will skip uploading the translations for any custom locales without modification. (I'm not sure how we can cleanly work around this in the script either, without having some list of special cases.)
2.) Crowdin's web interface apparently doesn't even work properly with custom locales. For example from the project's home page my custom locale was listed with the others but clicking on it produces a 404!

(The synchronisation script did successfully upload the Spanish translations using the locale code of es once I had hacked where the script checks which locales Crowdin supports.)

So I guess if using the locale code of es-ES is not going to work we might have to add this locale name mapping functionality to the CMS after all :(

comment:39 Changed 4 years ago by sebastian

On a side note, we use Chrome's i18n format here. Therefore the underscores. So on second thought, while our CMS and websites technically don't need to be 100% compatible with that format, I think it's still not worth breaking consistency here. In particular, since it's in fact not as simple as making the server recognize dashes instead of underscores (and adding custom locales on CrowdIn).

So IMO we should rename following locales like below in web.adblockbrowser.org and map them the same way when syncing with CrowdIn, leaving other websites unchanged.

  • es-ES -> es
  • pt-PT -> pt
  • pt-BR -> pt_BR
  • zh-CN -> zh_CN
  • zh-TW -> zh_TW
Last edited 4 years ago by sebastian (previous) (diff)

comment:40 Changed 4 years ago by kzar

  • Blocked By 3076 added; 3058 removed
  • Component changed from Infrastructure to Websites
  • Description modified (diff)

comment:41 Changed 4 years ago by saroyanm

  • Ready set

Looks good to me, I can implement this changes and push to live, the only thing is the ticket will stay open until @dave will finish with translation script update, but shouldn't be a problem I think in this case, if the ticket will stay open for longer time I'll remove dependencies.

comment:42 Changed 4 years ago by saroyanm

  • Description modified (diff)

comment:43 Changed 4 years ago by saroyanm

  • Owner set to saroyanm
  • Review URL(s) modified (diff)

comment:44 Changed 4 years ago by saroyanm

  • Status changed from new to reviewing

comment:46 Changed 4 years ago by saroyanm

  • Resolution set to fixed
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.