Changes between Version 7 and Version 10 of Ticket #6647


Ignore:
Timestamp:
05/06/2018 02:41:33 PM (14 months ago)
Author:
sebastian
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #6647

    • Property Type changed from defect to change
    • Property Component changed from Core to Platform
    • Property Owner changed from mjethani to sebastian
    • Property Summary changed from Filters containing Unicode variation selectors do not match to Stop converting domains from punycode to unicode
  • Ticket #6647 – Description

    v7 v10  
    1 === Environment === 
    2 Adblock Plus 3.0.3 
     1=== Background === 
    32 
    4 === How to reproduce === 
    5  1. Add the filter `❤️` 
    6  2. Visit https://xn--i-7iq.ws/ 
     3In Adblock Plus we are going to quite some length to convert the domains in the URLs reported by the browser from punycode (e.g. `xn--i-7iq.ws`) to unicode (e.g. `i❤.ws`), so that filters can be written using the unicode representation (rather than punycode). This, however, comes with a performance penalty, while the benefits for filter lists authors are questionable. 
    74 
    8 === Observed behaviour === 
    9 Subresources like https://xn--i-7iq.ws/js/bundle.js are not blocked 
     5The original idea was that it feels more natural to filter list authors to spell out IDN domains in their native alphabet (rather then bothering about an obscure representation like punycode). However, while in the address bar the domain may (or may not) be rendered using the native alphabet, latest when inspecting the DOM, looking at the source code or at the HTTP requests, all domains are given in punycode encoding. 
    106 
    11 === Expected behaviour === 
    12 Subresources like https://xn--i-7iq.ws/js/bundle.js should get blocked 
     7Furthermore, things become particularly confusing with unicode characters that can be composed in different ways, resulting in different punycode, but looking the same when rendered as unicode (e.g. `❤️` vs `❤`). 
    138 
    14 === Notes === 
    15 Unlike the blue heart, the green heart, and others, the red heart is composed of [http://unicode.org/faq/vs.html two Unicode code points] for [https://stackoverflow.com/q/42679712 historical reasons]. The actual domain name is in fact i❤.ws. 
     9=== What to change === 
     10 
     11Replace `stringifyURL(url)` with `url.href`, and replace `getDecodedHostname(url)` with `url.hostname`. As a result, IDN (non-ascii) domains given in a filter's pattern or `$domain` option should be expected to be in punycode encoding.